From e79a4f41f07b459e3682cce8b57a67652b5b78f2 Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Thu, 9 Mar 2023 18:50:35 +0100 Subject: [PATCH 01/32] IPIP: compact denylist format --- COMPACT_DENYLIST_FORMAT.md | 338 ++++++++++++++++++++++++++++ IPIP/383-compact-denylist-format.md | 110 +++++++++ 2 files changed, 448 insertions(+) create mode 100644 COMPACT_DENYLIST_FORMAT.md create mode 100644 IPIP/383-compact-denylist-format.md diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md new file mode 100644 index 000000000..20e945cca --- /dev/null +++ b/COMPACT_DENYLIST_FORMAT.md @@ -0,0 +1,338 @@ +# Specification Template + +![wip](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) + +**Author(s)**: +- @hsanjuan + +**Maintainer(s)**: +- @hsanjuan + +* * * + +**Abstract** + +This is the specification for [compact denlylist format V1](IPIP/383-compact-denylist-format.md). + +Denylists provide a way to indicate what content should be blocked by IPFS. + +## Organization of this document + +- [Introduction](#introduction) +- [Specification](#specification) + - [Test fixtures](#test-fixtures) + - [Security](#security) + - [Privacy and User Control](#privacy-and-user-control) + +## Introduction + +A denylist is a collection of items that will be "blocked" on IPFS software. + +While this specification is implementation agnostic and just defines the form +and syntax supported by the denylist, it is clear that when talking of +blocking we are specifically thinking on how to implement it in a way that is +efficient and operationally sound. Thus, we are thinking of lists that will +grow to be made of billions of items, that will be constantly updated while +the application runs, that will be shared and distributed around using IPFS +itself and that users should have the power to edit and adjust very easily. + +The presented denylist format is the result of careful reflection on such +terms. Our list format starts by including a **header**, which provides basic +information about the list itself, and can be used to set list-wide options +(*hints*, as we call them). We choose YAML for simplicity, readability, ease of +use and parser support. + +In our lists, *hints* are a way of providing additional, optional information, +relative to the items in the list that can be processed by machines. For +example, a hint can tell implementations about HTTP return codes for blocked +items, when they are requested through the gateway. A hint can provide a +reason, or specify deviations from defaults. While there will be a minimal +number of specified hints, users can include custom ones and parsers can +implement functionality accordingly even when not part of the base specification. + +The denylist itself, after the header, is a collection of **block items** and +block-item-specific hints. There are different flavours of block items, +depending on whether we are blocking by CID, CID+path, IPNS, using +double-hashing etc. but the idea is that whether an item is blocked or not can +be decided directly and ideally, prior to retrieval. + +We include *negative block items* as well, with the idea of enabling denylists +that are append-only. One of the main operational constraints we have seen is +that a single item can cause a full denylist to be re-read, re-parsed and +ultimately need a full restart of the application. We want to avoid that by +providing operators and implementors with the possiblity of just watching +denylists for new items without then need to restart anything while new items +are added. This also gives the possiblity of storing an offset and seeking +directly to it after application restarts. + +Another aspect that we have maintained in the back of our minds is the +possiblity of sharing lists using IPFS. The append-mostly aspect also plays a +role here, for lists can be chunked and DAG-ified and only the last chunk will +change as the file grows. This makes our lists immediately friendly to +content-addressing and efficient transmission over IPFS. However, the +protocols, subscriptions and list-sharing approaches are rightfully beyond +this spec. + +Beyond all of that, we put emphasis in making our format easily editable by +users and facilitating integrations using scripts and with other applications +(unrelated to the implementation of the parsing/blocking inside IPFS). We +conciously avoid JSON and other machine formats and opt for text and for +space-delimited items in a grep/sed/cut-friendly way. For example, we expect +that the following should just work accross implementations for blocking +something new: + +``` +echo /ipfs/Qmcid >> ~/.config/ipfs/custom.deny +``` + +We conciously avoid defining any other API other than expecting +implementations to honor blocking what is on the denylist and act accordingly +when it is updated. Thus, we do not require implementations to provide an HTTP +endpoint to modify list items etc. that is outside the scope of this spec, and +entirely dependent on what each implementation wants to do and how they want +to do it. + +As a last note, if we take Kubo and the go-ipfs stack as the reference IPFS +implementation, we expect the blocking-layer (that is, the introduction of the +logic that decides whether an item is blocked or not), to happen cleanly at +the `Resolver` and `BlockService` interfaces. + +This specification corresponds to V1 of the compact list format. We have +limited the number of features and extensions to a minimum to start working +with, leaving some ideas on the table and the door open to develop the format +in future versions. + +## Specification + +### Denylist file extension and locations + +While not pertaining to the denylist format itself, we introduce the following conventions about denylist files when they are stored in the local filesystem: + +- Denylist files are named with the extension `.deny`. +- Implementations should look in `/etc/ipfs/denylists/` and `$XDG_CONFIG_HOME/ipfs/denylists/` for denylist files. +- Denylist files are processed in alphabetical order so that rules from later denylists override rules from earlier denylists on conflict. + +### Denylist format + +#### Summary + +The following example showcases the features and syntax of a compact denylist: + +``` +version: 1 +name: IPFSorp blocking list +description: A collection of bad things we have found in the universe +author: abuse-ipfscorp@example.com +hints: + gateway_status: 410 + double_hash_fn: sha256 + double_hash_enc: hex +--- +/ipfs/QmYvggjprWhRYiDhyZ57gtkadEBhcfPScGyx1AofkgAk3Q reason:DCMA +/ipfs/bafkreigtnn3j24rs5q2qhx3kleisjngot5w2lgd32armqbv2upeaqesrna +/ipfs/bafkreifhlk37n6gcnt6pjmvdtqdzxrok35wh46jjobrqqtqckbn4ygk3yy/dirty%20movies/xxx.mp4 +/ipfs/bafkreidxe6kfaurhhxzkh6wsvbqwzcu5eluwm57a62gftxwt6w4zuiljte/* +/ipfs/bafkreigtdosqa2q542lhmt74aprtjsomobar6x3gp3zlrwdnyh56euphay/pics/secret* +/ipns/example.com gateway_status:410 +/ipns/QmdxLxa4Sz6ygEhL9FKwfrknL9xXoeFJRFCDS8bQwFmFDz +/ipns/example.com/hidden/* +//f36d4ce6cf64f2aac2c8cab023be1af1842681bad77fb3b379740e2f76f10a31 +/mime/* +-/mime/txt +``` + +#### High level list format + +A denylist is made of an optional header and a list of blockitems separated by newlines. + +``` +
+--- + [hint_list] + [hint_list] +... +``` + +#### Header + +The list header is a YAML block: + +- Must be valid YAML +- Fully optional +- 1KB maximum size +- Delimited by a line containing `---` at the end (document separator) + +Known-fields: + +- `version`: the denylist format version. Defaults to 1 when not specified. +- `name` +- `description` +- `author` +- `hints`: a map of *hints*. See section below for known hints + +#### Hints + +A *hint* is a key-value duple associated to the denylist as a whole (part of the header), or to a specific \. + +Known hints: + +- `double_hash_fn`: the multicodec string for the hashing function used for double-hashing. **Default**: `sha2-256` +- `double_hash_base`: the multibase string for the encoding if the double-hashing function result. **Default**: `base16`. + +#### List body + +A denylist is made of lines which are made by a *block items* followed by zero or more space-separated hints. + +Lines should not contain more than 2MiB of data. + +#### Block item + +A block item represents a rule to enable content-blocking: +- `` elements are expected to be %-encoded, per [RFC 3986, section 2.1](https://developer.mozilla.org/en-US/docs/Glossary/percent-encoding). + +##### `/ipfs/\` + +Blocks a specific multihash. If the CID is a CIDv1, it blocks the +multihash. Blocking directly by multihash must be done using CIDv0s (that is, +base58btc-encoded multihashes). This does not prevent resolution of sub-paths starting at this CID. + +Blocking layer recommendation: BlockService. + +##### `/ipfs/\/\` + +Blocks the exact ipfs path that is referenced from the multihash embedded the +CID before attempting to resolve it. It does not block the CID that the path resolves to. + +Blocking layer recommendation: Resolver. + +##### `/ipfs/\/*` `/ipfs/\/\* + +Blocks any multihash-path combination starting with the the given path prefix. `/*` includes the empty path. Thus `/ipfs//*` blocks the CID itself, and any paths. Examples: + +- `/ipfs//*` : blocks CID (by multihash) and any path before resolving. +- `/ipfs//ab*`: blocks any path derived from the CID (multihash) and starting with "ab", including "ab" +- `/ipfs//ab/*`: equivalent to the above. + +Blocking layer recommendation: Resolver + (BlockService if the CID itself is blocked too). + +##### `/ipns/\` + +Blocks the given IPNS before resolving. It does not block the CID that it resolves to. + +Blocking layer recommendation: Resolver. + +##### `/ipns/\/\` + +Blocks specifically the IPNS path, before resolving. + +Blocking layer recommendation: Resolver. + +##### `/ipns/\/*` `/ipns/\/\*` + +Same as with the `/ipfs/` rule, blocks IPNS paths starting with the given path prefix. `/*` is equivalent to the empty string, so `/abc/*` == `/abc*`. + +Blocking layer recommendation: Resolver. + +##### `/\` `/\/*` `/\*` + +Block solely by looking at the path component, and ignoring the CID/IPNS parts, before resolving. + +This blocks all the paths matching exactly or having the same prefix as the one in the rule: + +- `/my/path`: blocks any item that tries to resolve `/my/path`, regardless of the CID used. +- `/my/path*` and `/my/path/*`: blocks any paths that contain the prefix `/my/path`. + +Blocking layer recommendation: Resolver. + +##### `//\` + +Blocks a double-hashed item, which can be: + +- The hash of a CIDv1base32[+path]: legacy badbits, block-by-cid format +- The hash of an IPNS path `/ipns/*`. +- The hash of a CIDv0[+path] + +Blocking layer recommendation: Resolver + BlockService. + +In order to check for a matching rule, the Resolver should: + +- IPFS path: convert the CID to v1base32 and hash the path without the `/ipfs/` prefix. +- IPFS path: convert the CID to v0 and hash the path without the `/ipfs/` prefix. +- IPNS path: hash the path "as is". + +The Blockservice should, in turn do the following to check for matches: + +- Convert the CID to v1base32 (keeping the codec) and hash the CID string +- Convert the CID to v0 and hash the CID string + +When blocking by double-hashing the recommendation is to use the result of hasing `[/]`. This ensures that blocking by multihash happens. + +##### `/mime/\` `/mime/*` + +Blocks content detected to be of the given type. `/mime/*` blocks all the mimetypes and is meant to work with allow rules (all mimetypes blocked except specific ones). + +Blocking layer recommendation: Unixfs + +Our recommendation is that /mime/ rules automatically set IPFS clients into a +"unixfs only" mode where only unixfs (+raw blocks) are allowed at the +BlockService layer, and content type is checked at the Unixfs layer, as the +blocks get assembled into an actual files. That should cover gateway usage. + +#### Allow block items + +Block items can be prepended by `+`, signaling that they are to be allowed and +triumphing over other negative entries. Implementations should check first if +items have been allowed, before processing blocking rules. Examples: + +``` +/mime/* ++/mime/text/plain +/ipfs//photo* ++/ipfs//photo123.jpg +``` + +#### Negative block items + +Block items can be prepended `-`, signaling that they undo a block item found +previously on the list. This allows to remove entries from a list by just +negating them in an append-only fashion. + +#### Hint list + +A hint list is an optional space-separated list of hints associated with specific block items in the form: + +``` + hintA:v1 hintB:v2 hintC:v3 +``` + +### Test fixtures + +TODO + +### Security + +This proposal takes into account security: + +- Denylist headers and line-length limits are well specified to avoid malformed lists to cause things like large memory usage while parsing. +- Supported type of blocks have been though out to avoid amplified consumption of resources or side effects (i.e. downloading of additional dag-blocks) during the implementation. +- Paths are sanitized and follow the same encoding rules as URLs (RFC 3986), so that existing and safe parsing can be done with regular tooling. +- Official and custom-hint systems allow the introduction of additional features that can co-exist with the specified format without needing to be supported. + +### Privacy and User Control + +The main aspect regarding privacy in the scope of this specification has to do +with supporting the use of double-hashing in block items. + +Double-hashing is particularly useful when the denylist is meant to be shared. Double-hashing: + +- Prevents readers of the denylist to know what the original content-address + of the block item is, and therefore avoids making the denylist a directory + of *bad* content. This is particularly useful for harmful content, where + solely accessing it is bad. +- Double-hashing does not exclude adding additional context via comments or hints +- The presence of a single double-hashed block item makes necessary that the implementation hashes every CID and CID+path that needs to be checked, which has a performance impact. +- In general, it is good that users can inspect the nature of the content blocked if they wish to, so we recommend not using double-hashing by default as it helps transparency (i.e. blocking due to copyright claims). + +## Copyright + +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). diff --git a/IPIP/383-compact-denylist-format.md b/IPIP/383-compact-denylist-format.md new file mode 100644 index 000000000..a4efb1e51 --- /dev/null +++ b/IPIP/383-compact-denylist-format.md @@ -0,0 +1,110 @@ +# IPIP-383: Compact denylist format + +- Start Date: 2023-03-09 +- Related Issues: + - A different proposal: https://github.com/ipfs/specs/pull/340 + +## Summary + +This IPIP introduces a line-based denylist format for content blocking on IPFS +focused on simplicity, scalability and ease-of-use, + +## Motivation + +IPFS implementations should support content moderation, particularly when it +comes to deployments of publicly-accessible infrastructure like gateways. + +The first step in a larger strategy to enable decentralized content moderation +in IPFS setups is to agree in a denylist format that different implementations +can rely on and share. + + +## Detailed design + +See [Compact Denylist Format](../COMPACT_DENYLIST_FORMAT.md). + +## Design rationale + +This proposal introduces a new denylist format which aims to fulfil the +following aspects, which are a must for such a system: + +* Efficient parsing at scale. Compact. +* Simplicity and extensibility for extra features, both in future versions of + the spec and in custom systems. +* Easy to read and to understand. +* Integration-ready: Avoid the requirement of custom tooling or implementation. + support to manage denylists. Text-file operations as interface for + list-editing. +* Support the necessary types of blocks (by cid, by path, double-hash etc.) + needed by users and operators. +* IPFS and DAGification friendly. + +The proposed design is part of a holistic approach to content-moderation for IPFS for which we have the following detailed wishlist of items ultimately related to the denylist format: + +- Regarding the type of blocking: + - Ability to block content from being retrieved, stored or served by multihash + - Ability to block content that is referenced with an IPFS-path from a blocked multihash or traversing a blocked multihash. + - Ability to block by regexp-matching an IPFS path + - Ability to block based on content-type (i.e. only store/serve plain-text,and pictures) + - Ability to block based on CID codec (only allow Codec X) + - Ability to block based on multihash function (”no identity multihashes”) + - Ability to block IPNS names + +- Regarding the lists: + - Compact format, compression friendly + - Line-based so that updates can be watched + - Lists support CIDs + - Lists support CIDs+path (explicit) + - Lists support CIDs+path (implicit - everything referenced from CID) + - Lists support double-hashed multi-hashes + - Lists support double-hashed cid+path (current badbits format) + - Lists can be edited by hand on a text editor + - Lists are ipfs-replication-friendly (adding a new entry does not require downloading more than 1 IPFS block, to sync the list). + - Lists support comments + - Lists support gateway http error hints (i.e. type of block) + - `echo "/ipfs/cid" >> ~/.config/ipfs/denylists/custom` should work + - Lists have a header section with information about the list. + +- Regarding the implementation: + - Multiple denylists should be supported + - Hot-reloading of list (no restart of IPFS required) + - List removal does not require restart + - Minimal introduction of latency + - Minimal memory footprint (i.e. only read minimum amount of data into memory) + - Clean denylist module entrypoints (easy integration in current ipfs stack layers) + - Portable architecture (to other IPFS implementations). i.e. good interfaces to switch from an embedded implementation to something that could run separately, or embedded in other languages (i.e. even servicing multiple ipfs daemons). + - Text-based API. `ipfs deny ` and the like are nice-to-have but not a must to work with denylists. + - Security in mind: do not enable amplification attacks through lists (i.e. someone requesting a recursively blocked CID repeteadly over the gateway endpoint causes traversal of the whole CID-DAG. + +- Regarding list distribution: + - Ability to subscribe to multiple lists, and fetch any updates as they happen + - Ability to publish own lists so that others can subscribe to them + - List-subscription configuration or file details remote lists that the user is subscribed to. Editable by hand. + - Ability to subscribe to list subscriptions. + - List subscriptions can carry context (i.e. publisher, email, type of blocking. + +### User benefit + +Users and developers will benefit from a list format that is easy to work with because: + +* It can be understood by just looking at it. +* It can be edited by hand. +* Implementations can choose to support different aspects (i.e. blocking but no optional hints). +* Denylist parsers are easy and stupid. + +### Compatibility + +The current Protocol Labs denylist format +https://badbits.dwebops.pub/denylist.json can be easily converted into the +proposed compact format. + + +### Alternatives + +This proposal is a follow up to a [previous proposal](https://github.com/ipfs/specs/pull/340), which has several shortcomings that make it not very practical when working at scale. Both list formats can co-exist though but ultimately it will be a matter of implementation support, and it would be better to settle on one thing. + +It is also a followup on the "badbits" denylist format, which has similar issues and is not flexible enough. + +### Copyright + +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). From 2c17dd270f3b86a4c2f19bd11a32150dca513200 Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Tue, 28 Mar 2023 22:13:23 +0200 Subject: [PATCH 02/32] Improve IPIP-383 spec proposal Clarify a number of things, provide more details. --- COMPACT_DENYLIST_FORMAT.md | 232 +++++++++++++++++++++++-------------- 1 file changed, 148 insertions(+), 84 deletions(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index 20e945cca..1770bb777 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -37,22 +37,21 @@ the application runs, that will be shared and distributed around using IPFS itself and that users should have the power to edit and adjust very easily. The presented denylist format is the result of careful reflection on such -terms. Our list format starts by including a **header**, which provides basic -information about the list itself, and can be used to set list-wide options -(*hints*, as we call them). We choose YAML for simplicity, readability, ease of -use and parser support. +terms. Our list format starts by including an optional **header**, which +provides basic information about the list itself, and can be used to set +list-wide options (*hints*, as we call them). We choose YAML for simplicity, +readability, ease of use and parser support. In our lists, *hints* are a way of providing additional, optional information, relative to the items in the list that can be processed by machines. For example, a hint can tell implementations about HTTP return codes for blocked -items, when they are requested through the gateway. A hint can provide a -reason, or specify deviations from defaults. While there will be a minimal -number of specified hints, users can include custom ones and parsers can -implement functionality accordingly even when not part of the base specification. +items, when they are requested through the gateway. In this original +specification we do not define any mandatory or optional hint, but this may be +done in the future to support specific features. The denylist itself, after the header, is a collection of **block items** and block-item-specific hints. There are different flavours of block items, -depending on whether we are blocking by CID, CID+path, IPNS, using +depending on whether we are blocking by CID, CID+path, Path, IPNS, using double-hashing etc. but the idea is that whether an item is blocked or not can be decided directly and ideally, prior to retrieval. @@ -63,7 +62,8 @@ ultimately need a full restart of the application. We want to avoid that by providing operators and implementors with the possiblity of just watching denylists for new items without then need to restart anything while new items are added. This also gives the possiblity of storing an offset and seeking -directly to it after application restarts. +directly to it after application restarts. *negative block items* can also be +used to make exceptions to otherwise more general rules. Another aspect that we have maintained in the back of our minds is the possiblity of sharing lists using IPFS. The append-mostly aspect also plays a @@ -78,24 +78,24 @@ users and facilitating integrations using scripts and with other applications (unrelated to the implementation of the parsing/blocking inside IPFS). We conciously avoid JSON and other machine formats and opt for text and for space-delimited items in a grep/sed/cut-friendly way. For example, we expect -that the following should just work accross implementations for blocking -something new: +that the following should just work accross implementations for adding and +blocking something new: ``` -echo /ipfs/Qmcid >> ~/.config/ipfs/custom.deny +echo /ipfs/QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768 >> ~/.config/ipfs/custom.deny ``` We conciously avoid defining any other API other than expecting implementations to honor blocking what is on the denylist and act accordingly -when it is updated. Thus, we do not require implementations to provide an HTTP -endpoint to modify list items etc. that is outside the scope of this spec, and -entirely dependent on what each implementation wants to do and how they want -to do it. +when it is updated. CLI commands or API endpoint to modify list items etc. are +outside the scope of this spec. Implementations how much information to +provide to users when a request for an IPFS object is blocked. As a last note, if we take Kubo and the go-ipfs stack as the reference IPFS implementation, we expect the blocking-layer (that is, the introduction of the logic that decides whether an item is blocked or not), to happen cleanly at -the `Resolver` and `BlockService` interfaces. +the `NameSystem`, `path.Resolver` and `BlockService` interfaces (IPNS, IPFS +Path and CID blocks respectively). This specification corresponds to V1 of the compact list format. We have limited the number of features and extensions to a minimum to start working @@ -104,13 +104,15 @@ in future versions. ## Specification -### Denylist file extension and locations +### Denylist file extension, locations and order While not pertaining to the denylist format itself, we introduce the following conventions about denylist files when they are stored in the local filesystem: - Denylist files are named with the extension `.deny`. -- Implementations should look in `/etc/ipfs/denylists/` and `$XDG_CONFIG_HOME/ipfs/denylists/` for denylist files. -- Denylist files are processed in alphabetical order so that rules from later denylists override rules from earlier denylists on conflict. +- Implementations should look in `/etc/ipfs/denylists/` and + `$XDG_CONFIG_HOME/ipfs/denylists/` for denylist files. +- Denylist files are processed in alphabetical order so that rules from later + denylists override rules from earlier denylists on conflict. ### Denylist format @@ -120,13 +122,12 @@ The following example showcases the features and syntax of a compact denylist: ``` version: 1 -name: IPFSorp blocking list +name: IPFSCorp blocking list description: A collection of bad things we have found in the universe author: abuse-ipfscorp@example.com hints: gateway_status: 410 - double_hash_fn: sha256 - double_hash_enc: hex + enable_legacy_doublehash: true --- /ipfs/QmYvggjprWhRYiDhyZ57gtkadEBhcfPScGyx1AofkgAk3Q reason:DCMA /ipfs/bafkreigtnn3j24rs5q2qhx3kleisjngot5w2lgd32armqbv2upeaqesrna @@ -162,7 +163,7 @@ The list header is a YAML block: - 1KB maximum size - Delimited by a line containing `---` at the end (document separator) -Known-fields: +Known-fields (they must be lowercase): - `version`: the denylist format version. Defaults to 1 when not specified. - `name` @@ -174,100 +175,153 @@ Known-fields: A *hint* is a key-value duple associated to the denylist as a whole (part of the header), or to a specific \. -Known hints: - -- `double_hash_fn`: the multicodec string for the hashing function used for double-hashing. **Default**: `sha2-256` -- `double_hash_base`: the multibase string for the encoding if the double-hashing function result. **Default**: `base16`. +Header hints can be used to set denylist-wide options or information that +implementations can choose to interpret or not. #### List body A denylist is made of lines which are made by a *block items* followed by zero or more space-separated hints. -Lines should not contain more than 2MiB of data. +Lines should not be longer than 2MiB including the "\n" delimiter. #### Block item A block item represents a rule to enable content-blocking: -- `` elements are expected to be %-encoded, per [RFC 3986, section 2.1](https://developer.mozilla.org/en-US/docs/Glossary/percent-encoding). -##### `/ipfs/\` +- `PATH` elements are expected to be %-encoded, per [RFC 3986, section 2.1](https://developer.mozilla.org/en-US/docs/Glossary/percent-encoding). +- `CID` elements represent a CID (either V0 or V1). +- `CIDv0` are for us equivalent to baseb58btc-encoded sha256 multihashes although they are not the same thing (a CIDV0 carries implicit codec (dag-pb) and multibase information (b58btc). When we say a b58-encoded multihash needs to be extracted from the CID, this usually is a no-op in case of CIDv0s. + +##### `/ipfs/CID` + +CID-rule: Blocks a specific multihash. If the CID is a V1, it blocks the +multihash contained in it (CIDv0s are multihashes already). -Blocks a specific multihash. If the CID is a CIDv1, it blocks the -multihash. Blocking directly by multihash must be done using CIDv0s (that is, -base58btc-encoded multihashes). This does not prevent resolution of sub-paths starting at this CID. +When users want to block by multihash directly, they must base58btc-encoded +multihashes. This rule does not block subpaths that start at this CID, only +the CID itself. Blocking layer recommendation: BlockService. -##### `/ipfs/\/\` +##### `/ipfs/CID/PATH` -Blocks the exact ipfs path that is referenced from the multihash embedded the -CID before attempting to resolve it. It does not block the CID that the path resolves to. +IPFS-Path-Rule: Blocks the exact ipfs path that is referenced from the +multihash embedded the CID before attempting to resolve it. It does not block +the CID that the path resolves to. -Blocking layer recommendation: Resolver. +Note `/ipfs/CID/path` and `/ipfs/CID/path/` are equivalent rules. -##### `/ipfs/\/*` `/ipfs/\/\* +Blocking layer recommendation: PathResolver. -Blocks any multihash-path combination starting with the the given path prefix. `/*` includes the empty path. Thus `/ipfs//*` blocks the CID itself, and any paths. Examples: +##### `/ipfs/CID/*` `/ipfs/CID/P/A/T/H*` -- `/ipfs//*` : blocks CID (by multihash) and any path before resolving. -- `/ipfs//ab*`: blocks any path derived from the CID (multihash) and starting with "ab", including "ab" -- `/ipfs//ab/*`: equivalent to the above. +IPFS-Path-Prefix-Rule: Blocks any multihash-path combination starting with the +the given path prefix. `/*` includes the empty path. Thus `/ipfs/CID/*` +blocks the CID itself, and any paths. Examples: -Blocking layer recommendation: Resolver + (BlockService if the CID itself is blocked too). +- `/ipfs/CID/*` : blocks CID (by multihash) and any path before resolving. +- `/ipfs/CID/ab*`: blocks any path derived from the CID (multihash) and starting with "ab", including "ab" +- `/ipfs/CID/ab/*`: equivalent to the above. -##### `/ipns/\` +Blocking layer recommendation: PathResolver + (BlockService if the CID itself is blocked too). -Blocks the given IPNS before resolving. It does not block the CID that it resolves to. +##### `/ipns/IPNS` -Blocking layer recommendation: Resolver. +IPNS-rule: Blocks the given IPNS name before resolving. It does not block the CID that it +resolves to. -##### `/ipns/\/\` +If the IPNS name is a domain name, it is blocked directy. -Blocks specifically the IPNS path, before resolving. +If the IPNS name is a CIDv1 (libp2p-key) or b58-encoded-multihash (CIDV0), +then the blocking affects the underlying Multihash. -Blocking layer recommendation: Resolver. +Blocking layer recommendation: NameSystem. -##### `/ipns/\/*` `/ipns/\/\*` +##### `/ipns/IPNS/PATH` -Same as with the `/ipfs/` rule, blocks IPNS paths starting with the given path prefix. `/*` is equivalent to the empty string, so `/abc/*` == `/abc*`. +IPNS-Path-rule: Blocks specifically the IPNS path, before resolving. Equivalent to `/ipfs/CID/PATH`. -Blocking layer recommendation: Resolver. +Blocking layer recommendation: There is no good place to implement this rule +as the NameSystem only handles IPNS names (without paths), and the +path.Resolver only handles already-resolved Paths. -##### `/\` `/\/*` `/\*` +##### `/ipns/NAME/*` `/ipns/NAME/PATH*` -Block solely by looking at the path component, and ignoring the CID/IPNS parts, before resolving. +IPNS-Path-Prefix-Rule: Same as with the IPFS-Path-Prefix-Rule. -This blocks all the paths matching exactly or having the same prefix as the one in the rule: +Blocking layer recommendation: There is no good place to implement this rule +as the NameSystem only handles IPNS names (without paths), and the +path.Resolver only handles already-resolved Paths. + +##### `/PATH` `/PATH/*` `/PATH*` + +Subpath-Rule: Block solely by looking at the subpath component of an IPFS path. Examples: - `/my/path`: blocks any item that tries to resolve `/my/path`, regardless of the CID used. - `/my/path*` and `/my/path/*`: blocks any paths that contain the prefix `/my/path`. -Blocking layer recommendation: Resolver. +Blocking layer recommendation: PathResolver. + +##### `//DOUBLE_HASH` + +Doublehash-Rule: Blocks using double-hashed item, which can be: + +- The sha256-hex-encoded hash of `CIDV1_BASE32/PATH`: this is the legacy + badbits block anchor format. It can only block by CID and not by + multihash. When no path present, the trailing slash must be kept + (`CIDV1_BASE32/`). +- A b58-encoded multihash (a.k.a CIDV0), corresponding to the Sum() of: + - An IPNS-Path: + - `/ipns/IPNS` when the IPNS name is NOT a CID. + - The b58-encoded-multihash extracted from an IPNS key when the IPNS key + is a CID. + - An IPFS-Path: `b58-encoded-multihash/P/A/T/H` where the multihash is + extracted from the CID in `/ipfs/CID/P/A/T/H` (The multihash and the CID + are the same in the case of CIDV0). The `/P/A/T/H` component is optional + and should not have a trailing `/`. + +The latter form allows blocking by double-hash using any hashing function of +choice. Implementations will have to hash requests using all the hashing functions +used in the denylist, so we recommend sticking to one. + +Conveniently, the latter form allows using a b58-encoded sha256 multihashes +(usual form of CIDv0 - `Qmxxx...`), so that double-hashes can be like: + +``` +$ printf "QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768/my/path" | ipfs add --raw-leaves --only-hash --quiet | ipfs cid format -f '%M' -b base58btc +QmSju6XPmYLG611rmK7rEeCMFVuL6EHpqyvmEU6oGx3GR8 +``` + +The rule `//QmSju6XPmYLG611rmK7rEeCMFVuL6EHpqyvmEU6oGx3GR8` will block `/ipfs/bafybeihrw75yfhdx5qsqgesdnxejtjybscwuclpusvxkuttep6h7pkgmze/my/path`, with `QmSju6XPmYLG611rmK7rEeCMFVuL6EHpqyvmEU6oGx3GR8` being the base58-encoded multihash contained in `bafybeihrw75yfhdx5qsqgesdnxejtjybscwuclpusvxkuttep6h7pkgmze`. -##### `//\` +We can convert any CID to its multihash with: + +``` +$ ipfs cid format -f '%M' -b base58btc bafybeihrw75yfhdx5qsqgesdnxejtjybscwuclpusvxkuttep6h7pkgmze +QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768 +``` -Blocks a double-hashed item, which can be: +Blocking layer recommendation: NameSystem + PathResolver + BlockService. -- The hash of a CIDv1base32[+path]: legacy badbits, block-by-cid format -- The hash of an IPNS path `/ipns/*`. -- The hash of a CIDv0[+path] +In order to check for a matching rule, the PathResolver should: -Blocking layer recommendation: Resolver + BlockService. +- IPFS path: convert the CID to v1base32 and hash `CIDV1BASE32/PATH` with the + hashing functions used in the denylist. Match against declared double-hashes. +- IPFS path: convert the CID to CIDv0 and hash `CIDV0/PATH` without trailing `/` with the hashing functions used in the denylist. Match against declared double-hashes. +- IPNS path: -In order to check for a matching rule, the Resolver should: +The NameSystem should: -- IPFS path: convert the CID to v1base32 and hash the path without the `/ipfs/` prefix. -- IPFS path: convert the CID to v0 and hash the path without the `/ipfs/` prefix. -- IPNS path: hash the path "as is". +- If NAME is a domain name: Hash `/ipns/NAME` with the hashing functions used in the denylist. Match against declared double-hashes. +- If NAME is a CID, extract the multihash, encoded with baseb58btc and hash it with the hashing functions used in the denylist. Match against declared double-hashes. -The Blockservice should, in turn do the following to check for matches: +The BlockService should: -- Convert the CID to v1base32 (keeping the codec) and hash the CID string -- Convert the CID to v0 and hash the CID string +- Convert the CID to `CIDV1BASE32/` (keeping the CID codec and adding a slash at the end) and hash it with the hashing functions used in the denylist. Match against declared double-hashes. -When blocking by double-hashing the recommendation is to use the result of hasing `[/]`. This ensures that blocking by multihash happens. +- Convert the CID to b58-encoded-multihash (that is CIDv0) and hash the CID string. -##### `/mime/\` `/mime/*` +##### `/mime/MIMETYPE` `/mime/*` Blocks content detected to be of the given type. `/mime/*` blocks all the mimetypes and is meant to work with allow rules (all mimetypes blocked except specific ones). @@ -278,24 +332,31 @@ Our recommendation is that /mime/ rules automatically set IPFS clients into a BlockService layer, and content type is checked at the Unixfs layer, as the blocks get assembled into an actual files. That should cover gateway usage. -#### Allow block items +#### Allow (or negated) rules -Block items can be prepended by `+`, signaling that they are to be allowed and -triumphing over other negative entries. Implementations should check first if -items have been allowed, before processing blocking rules. Examples: +Block items can be prepended by `+`, that items matching the rule are to be allowed +rather than blocked. + +This can be used to undo existing rules, but also to add concrete exceptions to wider rules. Order matters, and Allow rules must come AFTER other existing rules. + +Implementations should parse rules in general, and match them in inverse order +as they appear in the denylist, so an explicit Allow rule will be evaluated +before previously defined Deny rules, and can return non-blocked status for an +item before further processing. + +Examples: ``` +/ipfs/QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768/photo* ++/ipfs/QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768/photo123.jpg /mime/* +/mime/text/plain -/ipfs//photo* -+/ipfs//photo123.jpg ++/ipns/my.domain +/ipns/my.domain ``` -#### Negative block items - -Block items can be prepended `-`, signaling that they undo a block item found -previously on the list. This allows to remove entries from a list by just -negating them in an append-only fashion. +In this example, `/ipns/my.domain` stays blocked because the deny rule happens +after the allow one. #### Hint list @@ -305,6 +366,9 @@ A hint list is an optional space-separated list of hints associated with specifi hintA:v1 hintB:v2 hintC:v3 ``` +Block items and hints are separated by one or more consecutive instances of +the "space" character. + ### Test fixtures TODO From ae1a22560ab1d88717b4a2947b4621c8c6f186ff Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Wed, 29 Mar 2023 17:32:50 +0200 Subject: [PATCH 03/32] Improve denylist example, add comment and empty lines --- COMPACT_DENYLIST_FORMAT.md | 72 ++++++++++++++++++++++++++++++-------- 1 file changed, 58 insertions(+), 14 deletions(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index 1770bb777..bd7ca72ad 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -126,30 +126,74 @@ name: IPFSCorp blocking list description: A collection of bad things we have found in the universe author: abuse-ipfscorp@example.com hints: - gateway_status: 410 - enable_legacy_doublehash: true + hint: value + hint2: value2 --- -/ipfs/QmYvggjprWhRYiDhyZ57gtkadEBhcfPScGyx1AofkgAk3Q reason:DCMA -/ipfs/bafkreigtnn3j24rs5q2qhx3kleisjngot5w2lgd32armqbv2upeaqesrna -/ipfs/bafkreifhlk37n6gcnt6pjmvdtqdzxrok35wh46jjobrqqtqckbn4ygk3yy/dirty%20movies/xxx.mp4 -/ipfs/bafkreidxe6kfaurhhxzkh6wsvbqwzcu5eluwm57a62gftxwt6w4zuiljte/* -/ipfs/bafkreigtdosqa2q542lhmt74aprtjsomobar6x3gp3zlrwdnyh56euphay/pics/secret* -/ipns/example.com gateway_status:410 -/ipns/QmdxLxa4Sz6ygEhL9FKwfrknL9xXoeFJRFCDS8bQwFmFDz -/ipns/example.com/hidden/* -//f36d4ce6cf64f2aac2c8cab023be1af1842681bad77fb3b379740e2f76f10a31 -/mime/* --/mime/txt +# Blocking by CID - blocks wrapped multihash. +# Does not block subpaths. +/ipfs/bafybeihvvulpp4evxj7x7armbqcyg6uezzuig6jp3lktpbovlqfkuqeuoq + +# Block all subpaths +/ipfs/QmdWFA9FL52hx3j9EJZPQP1ZUH8Ygi5tLCX2cRDs6knSf8/* + +# Block some subpaths (equivalent rules) +/ipfs/Qmah2YDTfrox4watLCr3YgKyBwvjq8FJZEFdWY6WtJ3Xt2/test* +/ipfs/QmTuvSQbEDR3sarFAN9kAeXBpiBCyYYNxdxciazBba11eC/test/* + +# Block some subpaths with exceptions +/ipfs/QmUboz9UsQBDeS6Tug1U8jgoFkgYxyYood9NDyVURAY9pK/blocked* ++/ipfs/QmUboz9UsQBDeS6Tug1U8jgoFkgYxyYood9NDyVURAY9pK/blockednot ++/ipfs/QmUboz9UsQBDeS6Tug1U8jgoFkgYxyYood9NDyVURAY9pK/blocked/not ++/ipfs/QmUboz9UsQBDeS6Tug1U8jgoFkgYxyYood9NDyVURAY9pK/blocked/exceptions* + +# Block IPNS domain name +/ipns/domain.example + +# Block IPNS domain name and path +/ipns/domain2.example/path + +# Block IPNS key - blocks wrapped multihash. +/ipns/k51qzi5uqu5dhmzyv3zac033i7rl9hkgczxyl81lwoukda2htteop7d3x0y1mf + +# Block all mime types with exceptions +/mime/image/* ++/mime/image/gif + +# Legacy CID double-hash block +# sha256(bafybeiefwqslmf6zyyrxodaxx4vwqircuxpza5ri45ws3y5a62ypxti42e/) +# blocks only this CID +//d9d295bde21f422d471a90f2a37ec53049fdf3e5fa3ee2e8f20e10003da429e7 + +# Legacy Path double-hash block +# Blocks bafybeiefwqslmf6zyyrxodaxx4vwqircuxpza5ri45ws3y5a62ypxti42e/path +# but not any other paths. +//3f8b9febd851873b3774b937cce126910699ceac56e72e64b866f8e258d09572 + +# Double hash CID block +# base58btc-sha256-multihash(QmVTF1yEejXd9iMgoRTFDxBv7HAz9kuZcQNBzHrceuK9HR) +# Blocks bafybeidjwik6im54nrpfg7osdvmx7zojl5oaxqel5cmsz46iuelwf5acja +# and QmVTF1yEejXd9iMgoRTFDxBv7HAz9kuZcQNBzHrceuK9HR etc. by multihash +//QmX9dhRcQcKUw3Ws8485T5a9dtjrSCQaUAHnG4iK9i4ceM + +# Double hash Path block using blake3 hashing +# base58btc-blake3-multihash(gW7Nhu4HrfDtphEivm3Z9NNE7gpdh5Tga8g6JNZc1S8E47/path) +# Blocks /ipfs/bafyb4ieqht3b2rssdmc7sjv2cy2gfdilxkfh7623nvndziyqnawkmo266a/path +# /ipfs/bafyb4ieqht3b2rssdmc7sjv2cy2gfdilxkfh7623nvndziyqnawkmo266a/path +# /ipfs/f01701e20903cf61d46521b05f926ba1634628d0bba8a7ffb5b6d5a3ca310682ca63b5ef0/path etc... +# But not /path2 +//QmbK7LDv5NNBvYQzNfm2eED17SNLt1yNMapcUhSuNLgkqz ``` #### High level list format -A denylist is made of an optional header and a list of blockitems separated by newlines. +A denylist is made of an optional header and a list of blockitems separated by newlines. Comment lines start with `#`. Empty lines are allowed. ```
--- [hint_list] + +# comment [hint_list] ... ``` From f989bd0ec6c4cbc65ad89414e88c9ae51a987846 Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Wed, 29 Mar 2023 17:34:07 +0200 Subject: [PATCH 04/32] Fix spec title in the file --- COMPACT_DENYLIST_FORMAT.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index bd7ca72ad..67380c06a 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -1,4 +1,4 @@ -# Specification Template +# Compact Denylist Format specification ![wip](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) From 9deed0309f64321be0fe5bb4a0e2813b20ceffb1 Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Wed, 29 Mar 2023 21:01:10 +0200 Subject: [PATCH 05/32] Add information about test fixtures --- COMPACT_DENYLIST_FORMAT.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index 67380c06a..9650582d3 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -20,6 +20,8 @@ Denylists provide a way to indicate what content should be blocked by IPFS. - [Introduction](#introduction) - [Specification](#specification) + - [Denylist File extension, locations and order](#denylist-file-extension-locations-and-order) + - [Denylist format](#denylist-format) - [Test fixtures](#test-fixtures) - [Security](#security) - [Privacy and User Control](#privacy-and-user-control) @@ -415,7 +417,12 @@ the "space" character. ### Test fixtures -TODO +Denylist parsing and correct behaviour can be tested using the +[test.deny](https://github.com/ipfs-shipyard/nopfs/blob/master/tester/test.deny) +denylist, which provides example rules and describes the expected behaviour in +detail. + +In particular, a [Blocker implementation validator](https://github.com/ipfs-shipyard/nopfs/tree/master/tester) is provided in Go, and can be adapted to other languages if needed. ### Security From dd875527048ed426544c401f26423746468556e6 Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Wed, 29 Mar 2023 21:05:22 +0200 Subject: [PATCH 06/32] Add mentions to NoPFS --- COMPACT_DENYLIST_FORMAT.md | 5 +++++ IPIP/383-compact-denylist-format.md | 6 ++++-- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index 9650582d3..140360f28 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -25,6 +25,7 @@ Denylists provide a way to indicate what content should be blocked by IPFS. - [Test fixtures](#test-fixtures) - [Security](#security) - [Privacy and User Control](#privacy-and-user-control) +- [Implementations](#implementations) ## Introduction @@ -448,6 +449,10 @@ Double-hashing is particularly useful when the denylist is meant to be shared. D - The presence of a single double-hashed block item makes necessary that the implementation hashes every CID and CID+path that needs to be checked, which has a performance impact. - In general, it is good that users can inspect the nature of the content blocked if they wish to, so we recommend not using double-hashing by default as it helps transparency (i.e. blocking due to copyright claims). +## Implementations + +- [NoPFS](https://github.com/ipfs-shipyard/nopfs): An implementation of IPIP-383 which add supports for content blocking to the go-ipfs stack and particularly to Kubo. + ## Copyright Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). diff --git a/IPIP/383-compact-denylist-format.md b/IPIP/383-compact-denylist-format.md index a4efb1e51..853e503fc 100644 --- a/IPIP/383-compact-denylist-format.md +++ b/IPIP/383-compact-denylist-format.md @@ -7,7 +7,9 @@ ## Summary This IPIP introduces a line-based denylist format for content blocking on IPFS -focused on simplicity, scalability and ease-of-use, +focused on simplicity, scalability and ease-of-use. + +A reference Go implementation of a denylist parser and Blocker component for the Go-IPFS stack exists at https://github.com/ipfs-shipyard/nopfs. ## Motivation @@ -96,7 +98,7 @@ Users and developers will benefit from a list format that is easy to work with b The current Protocol Labs denylist format https://badbits.dwebops.pub/denylist.json can be easily converted into the -proposed compact format. +proposed compact format. This is shown at https://badbits.dwebops.pub/denylist.json. ### Alternatives From 5c270832eb9f257c81fffef23e44a4be22991435 Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Fri, 16 Jun 2023 20:18:07 +0200 Subject: [PATCH 07/32] Update IPIP/383-compact-denylist-format.md Co-authored-by: Oli Evans --- IPIP/383-compact-denylist-format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/IPIP/383-compact-denylist-format.md b/IPIP/383-compact-denylist-format.md index 853e503fc..0db768d6b 100644 --- a/IPIP/383-compact-denylist-format.md +++ b/IPIP/383-compact-denylist-format.md @@ -98,7 +98,7 @@ Users and developers will benefit from a list format that is easy to work with b The current Protocol Labs denylist format https://badbits.dwebops.pub/denylist.json can be easily converted into the -proposed compact format. This is shown at https://badbits.dwebops.pub/denylist.json. +proposed compact format. This is shown at https://badbits.dwebops.pub/badbits.deny. ### Alternatives From f16344fb3f2feb0e2ccb89632ef5166d17a31271 Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Fri, 16 Jun 2023 20:18:14 +0200 Subject: [PATCH 08/32] Update COMPACT_DENYLIST_FORMAT.md Co-authored-by: Oli Evans --- COMPACT_DENYLIST_FORMAT.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index 140360f28..dbb3f5d80 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -253,7 +253,7 @@ Blocking layer recommendation: BlockService. ##### `/ipfs/CID/PATH` IPFS-Path-Rule: Blocks the exact ipfs path that is referenced from the -multihash embedded the CID before attempting to resolve it. It does not block +multihash embedded in the CID before attempting to resolve it. It does not block the CID that the path resolves to. Note `/ipfs/CID/path` and `/ipfs/CID/path/` are equivalent rules. From a9fb5b23bf32079f6ac55d497c6212358169bca6 Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Mon, 26 Jun 2023 10:27:50 +0200 Subject: [PATCH 09/32] Update COMPACT_DENYLIST_FORMAT.md Co-authored-by: Bumblefudge --- COMPACT_DENYLIST_FORMAT.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index dbb3f5d80..8f6aab7a7 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -430,7 +430,7 @@ In particular, a [Blocker implementation validator](https://github.com/ipfs-ship This proposal takes into account security: - Denylist headers and line-length limits are well specified to avoid malformed lists to cause things like large memory usage while parsing. -- Supported type of blocks have been though out to avoid amplified consumption of resources or side effects (i.e. downloading of additional dag-blocks) during the implementation. +- Supported type of blocks have been thought out to avoid amplified consumption of resources or side effects (i.e. downloading of additional dag-blocks) during the implementation. - Paths are sanitized and follow the same encoding rules as URLs (RFC 3986), so that existing and safe parsing can be done with regular tooling. - Official and custom-hint systems allow the introduction of additional features that can co-exist with the specified format without needing to be supported. From 6569655012f0ebc2fedb1843ec4728178a9e2c0a Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Mon, 26 Jun 2023 10:44:10 +0200 Subject: [PATCH 10/32] Be explicit that last-matching-rule wins on negations --- COMPACT_DENYLIST_FORMAT.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index 8f6aab7a7..f71797293 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -143,7 +143,7 @@ hints: /ipfs/Qmah2YDTfrox4watLCr3YgKyBwvjq8FJZEFdWY6WtJ3Xt2/test* /ipfs/QmTuvSQbEDR3sarFAN9kAeXBpiBCyYYNxdxciazBba11eC/test/* -# Block some subpaths with exceptions +# Block some subpaths with exceptions: last-matching-rule wins (!) /ipfs/QmUboz9UsQBDeS6Tug1U8jgoFkgYxyYood9NDyVURAY9pK/blocked* +/ipfs/QmUboz9UsQBDeS6Tug1U8jgoFkgYxyYood9NDyVURAY9pK/blockednot +/ipfs/QmUboz9UsQBDeS6Tug1U8jgoFkgYxyYood9NDyVURAY9pK/blocked/not @@ -381,8 +381,7 @@ blocks get assembled into an actual files. That should cover gateway usage. #### Allow (or negated) rules -Block items can be prepended by `+`, that items matching the rule are to be allowed -rather than blocked. +Block items can be prepended by `+`, which means that items matching the rule are to be allowed rather than blocked. This can be used to undo existing rules, but also to add concrete exceptions to wider rules. Order matters, and Allow rules must come AFTER other existing rules. From 7a40db0ae4eec94e3a31d5e2ef41da5206f9d4b8 Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Mon, 26 Jun 2023 10:48:25 +0200 Subject: [PATCH 11/32] compact denylist: switch to "!" for negations --- COMPACT_DENYLIST_FORMAT.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index f71797293..dd5199e56 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -145,9 +145,9 @@ hints: # Block some subpaths with exceptions: last-matching-rule wins (!) /ipfs/QmUboz9UsQBDeS6Tug1U8jgoFkgYxyYood9NDyVURAY9pK/blocked* -+/ipfs/QmUboz9UsQBDeS6Tug1U8jgoFkgYxyYood9NDyVURAY9pK/blockednot -+/ipfs/QmUboz9UsQBDeS6Tug1U8jgoFkgYxyYood9NDyVURAY9pK/blocked/not -+/ipfs/QmUboz9UsQBDeS6Tug1U8jgoFkgYxyYood9NDyVURAY9pK/blocked/exceptions* +!/ipfs/QmUboz9UsQBDeS6Tug1U8jgoFkgYxyYood9NDyVURAY9pK/blockednot +!/ipfs/QmUboz9UsQBDeS6Tug1U8jgoFkgYxyYood9NDyVURAY9pK/blocked/not +!/ipfs/QmUboz9UsQBDeS6Tug1U8jgoFkgYxyYood9NDyVURAY9pK/blocked/exceptions* # Block IPNS domain name /ipns/domain.example @@ -160,7 +160,7 @@ hints: # Block all mime types with exceptions /mime/image/* -+/mime/image/gif +!/mime/image/gif # Legacy CID double-hash block # sha256(bafybeiefwqslmf6zyyrxodaxx4vwqircuxpza5ri45ws3y5a62ypxti42e/) @@ -381,7 +381,7 @@ blocks get assembled into an actual files. That should cover gateway usage. #### Allow (or negated) rules -Block items can be prepended by `+`, which means that items matching the rule are to be allowed rather than blocked. +Block items can be prepended by `!`, which means that items matching the rule are to be allowed rather than blocked. This can be used to undo existing rules, but also to add concrete exceptions to wider rules. Order matters, and Allow rules must come AFTER other existing rules. @@ -394,15 +394,15 @@ Examples: ``` /ipfs/QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768/photo* -+/ipfs/QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768/photo123.jpg +!/ipfs/QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768/photo123.jpg /mime/* -+/mime/text/plain -+/ipns/my.domain +!/mime/text/plain +!/ipns/my.domain /ipns/my.domain ``` In this example, `/ipns/my.domain` stays blocked because the deny rule happens -after the allow one. +AFTER the allow one. #### Hint list From f610548ceb2e41c25ec5ce055ce72a040865e667 Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Tue, 3 Oct 2023 11:44:04 +0200 Subject: [PATCH 12/32] compact denylist: clarify denylist encoding as text file --- COMPACT_DENYLIST_FORMAT.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index dd5199e56..8731e82a2 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -189,7 +189,7 @@ hints: #### High level list format -A denylist is made of an optional header and a list of blockitems separated by newlines. Comment lines start with `#`. Empty lines are allowed. +A denylist is a UTF-8 encoded text file made of an optional header and a list of blockitems separated by newlines (`\n`). Comment lines start with `#`. Empty lines are allowed. ```
From 312e6251c19c880778aae0dfb701aa61abeb781d Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Tue, 3 Oct 2023 11:45:03 +0200 Subject: [PATCH 13/32] compact denylist: clarify exact number of bytes in header. Set to 1MiB --- COMPACT_DENYLIST_FORMAT.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index 8731e82a2..c5dddb4e8 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -207,7 +207,7 @@ The list header is a YAML block: - Must be valid YAML - Fully optional -- 1KB maximum size +- 1 MiB (1048576 bytes) maximum size - Delimited by a line containing `---` at the end (document separator) Known-fields (they must be lowercase): From 4f11c627437f907695911aae1ea44961cf66356b Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Tue, 3 Oct 2023 11:45:48 +0200 Subject: [PATCH 14/32] compact denylist: clarify that parsing should abort on unsupported versions --- COMPACT_DENYLIST_FORMAT.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index c5dddb4e8..fc7db103c 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -212,7 +212,9 @@ The list header is a YAML block: Known-fields (they must be lowercase): -- `version`: the denylist format version. Defaults to 1 when not specified. +- `version`: the denylist format version. Defaults to 1 when not + specified. Implementations should reject parsing denylist versions that they + do not support. - `name` - `description` - `author` From aa5f89bcb3ed9939d2a7056e66480238e1bd871d Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Tue, 3 Oct 2023 11:46:18 +0200 Subject: [PATCH 15/32] compact denylist: clarify hints and add custom field support --- COMPACT_DENYLIST_FORMAT.md | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index fc7db103c..043dea33b 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -218,14 +218,39 @@ Known-fields (they must be lowercase): - `name` - `description` - `author` -- `hints`: a map of *hints*. See section below for known hints +- `hints`: a map of *hints*. See below. + +The known fields-list may be expanded as this specification is developed. To +include custom information in the header, we recommend using custom fields or hints. + +Custom fields: + +- Fields starting with `x-` or `X-` are considered custom. Users can freely + include them in the header and implementations can support them as needed. +- Custom fields are not a property of each block item as "header hints" are + considered to be. + +In order to parse the header, implementations should read the denylist until a +`---` is found or the 1MiB limit is reached. If the `---` is found, they +should attempt parsing the header as YAML: + +- If parsing the header fails, they should abort and signal an error. +- If the size limit is reached, they should assume the list includes no header + and start parsing block items from the beginning of the denylist. A header + that was too large will be parsed line by line as block items and error + accordingly line per line, without causing excessive resource allocation. + #### Hints -A *hint* is a key-value duple associated to the denylist as a whole (part of the header), or to a specific \. +A *hint* is a key-value duple associated to a \. the denylist as +a whole (part of the header), or to a specific \. -Header hints can be used to set denylist-wide options or information that -implementations can choose to interpret or not. +A list of hints can optionally follow every \ as show +above. Hints can also be specified in the denylist header ("header +hints"). This is equivalent to adding the same hints to every single +\ in the denylist. Implementations should associate both the +specific and the header hints to every block rule. #### List body From 358677eacacd2d161d2936b756b9ad4907c54de9 Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Tue, 3 Oct 2023 11:46:47 +0200 Subject: [PATCH 16/32] compact denylist: make explicit different behaviours when finding invalid rules --- COMPACT_DENYLIST_FORMAT.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index 043dea33b..e63030142 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -266,6 +266,13 @@ A block item represents a rule to enable content-blocking: - `CID` elements represent a CID (either V0 or V1). - `CIDv0` are for us equivalent to baseb58btc-encoded sha256 multihashes although they are not the same thing (a CIDV0 carries implicit codec (dag-pb) and multibase information (b58btc). When we say a b58-encoded multihash needs to be extracted from the CID, this usually is a no-op in case of CIDv0s. +Implementations must decide what to do when processing a denylist and an invalid block-item rule is found: + +- Prominently log the parsing error (always recommended) +- Abort parsing and return a general error OR +- Continue processing the list, discarding unrecognized rules + + ##### `/ipfs/CID` CID-rule: Blocks a specific multihash. If the CID is a V1, it blocks the From 7214d0bcff3847ef21e0b61719327ccc3845fc3a Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Tue, 3 Oct 2023 11:47:13 +0200 Subject: [PATCH 17/32] compact denylist: remove mimetype blocking --- COMPACT_DENYLIST_FORMAT.md | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index e63030142..263d688b9 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -402,17 +402,6 @@ The BlockService should: - Convert the CID to b58-encoded-multihash (that is CIDv0) and hash the CID string. -##### `/mime/MIMETYPE` `/mime/*` - -Blocks content detected to be of the given type. `/mime/*` blocks all the mimetypes and is meant to work with allow rules (all mimetypes blocked except specific ones). - -Blocking layer recommendation: Unixfs - -Our recommendation is that /mime/ rules automatically set IPFS clients into a -"unixfs only" mode where only unixfs (+raw blocks) are allowed at the -BlockService layer, and content type is checked at the Unixfs layer, as the -blocks get assembled into an actual files. That should cover gateway usage. - #### Allow (or negated) rules Block items can be prepended by `!`, which means that items matching the rule are to be allowed rather than blocked. From 00e56b58ebe5f3a49435921329e3e94fb7efc236 Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Tue, 3 Oct 2023 11:47:36 +0200 Subject: [PATCH 18/32] compact denylist: word-wrap some lines --- COMPACT_DENYLIST_FORMAT.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index 263d688b9..734593ac2 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -373,7 +373,11 @@ $ printf "QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768/my/path" | ipfs add --r QmSju6XPmYLG611rmK7rEeCMFVuL6EHpqyvmEU6oGx3GR8 ``` -The rule `//QmSju6XPmYLG611rmK7rEeCMFVuL6EHpqyvmEU6oGx3GR8` will block `/ipfs/bafybeihrw75yfhdx5qsqgesdnxejtjybscwuclpusvxkuttep6h7pkgmze/my/path`, with `QmSju6XPmYLG611rmK7rEeCMFVuL6EHpqyvmEU6oGx3GR8` being the base58-encoded multihash contained in `bafybeihrw75yfhdx5qsqgesdnxejtjybscwuclpusvxkuttep6h7pkgmze`. +The rule `//QmSju6XPmYLG611rmK7rEeCMFVuL6EHpqyvmEU6oGx3GR8` will block +`/ipfs/bafybeihrw75yfhdx5qsqgesdnxejtjybscwuclpusvxkuttep6h7pkgmze/my/path`, +with `QmSju6XPmYLG611rmK7rEeCMFVuL6EHpqyvmEU6oGx3GR8` being the base58-encoded +multihash contained in +`bafybeihrw75yfhdx5qsqgesdnxejtjybscwuclpusvxkuttep6h7pkgmze`. We can convert any CID to its multihash with: From e0cae307b050e0e944a8c015628499ad6d59f939 Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Tue, 3 Oct 2023 16:32:52 +0200 Subject: [PATCH 19/32] Fix typo Co-authored-by: Marcin Rataj --- COMPACT_DENYLIST_FORMAT.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index 734593ac2..b297bf1c1 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -398,7 +398,7 @@ In order to check for a matching rule, the PathResolver should: The NameSystem should: - If NAME is a domain name: Hash `/ipns/NAME` with the hashing functions used in the denylist. Match against declared double-hashes. -- If NAME is a CID, extract the multihash, encoded with baseb58btc and hash it with the hashing functions used in the denylist. Match against declared double-hashes. +- If NAME is a CID, extract the multihash, encode it with baseb58btc and hash it with the hashing functions used in the denylist. Match against declared double-hashes. The BlockService should: From 5949f10ccd1ca24e184cf26c4ee3b61205d718f7 Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Tue, 3 Oct 2023 16:55:14 +0200 Subject: [PATCH 20/32] Address some comments from @lidel clarifying mostly double-hash processing --- COMPACT_DENYLIST_FORMAT.md | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index b297bf1c1..8ab271957 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -353,8 +353,8 @@ Doublehash-Rule: Blocks using double-hashed item, which can be: (`CIDV1_BASE32/`). - A b58-encoded multihash (a.k.a CIDV0), corresponding to the Sum() of: - An IPNS-Path: - - `/ipns/IPNS` when the IPNS name is NOT a CID. - - The b58-encoded-multihash extracted from an IPNS key when the IPNS key + - `/ipns/NAME` when the IPNS name is NOT a CID. + - The b58-encoded-multihash extracted from an IPNS name when the IPNS name is a CID. - An IPFS-Path: `b58-encoded-multihash/P/A/T/H` where the multihash is extracted from the CID in `/ipfs/CID/P/A/T/H` (The multihash and the CID @@ -388,17 +388,22 @@ QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768 Blocking layer recommendation: NameSystem + PathResolver + BlockService. -In order to check for a matching rule, the PathResolver should: +In order to check for a matching rule, the PathResolver working with `CID/PATH` elements should: -- IPFS path: convert the CID to v1base32 and hash `CIDV1BASE32/PATH` with the - hashing functions used in the denylist. Match against declared double-hashes. -- IPFS path: convert the CID to CIDv0 and hash `CIDV0/PATH` without trailing `/` with the hashing functions used in the denylist. Match against declared double-hashes. -- IPNS path: +- Convert the CID to v1base32 and hash `CIDV1BASE32/PATH` with the hashing + functions used in the denylist. Match against declared double-hashes. An + empty path means that the value to hash is `CIDV1BASE32/` (with the trailing + slash). This is the legacy hashing so the hashing function is usually + sha256 and the matched rules are legacy badbits anchor rules. +- Convert the CID to CIDv0 and hash `CIDV0/PATH` without trailing `/` with the + hashing functions used in the denylist. Match against declared + double-hashes. The NameSystem should: -- If NAME is a domain name: Hash `/ipns/NAME` with the hashing functions used in the denylist. Match against declared double-hashes. -- If NAME is a CID, extract the multihash, encode it with baseb58btc and hash it with the hashing functions used in the denylist. Match against declared double-hashes. +- If NAME is a CID (try parsing as CID first), extract the multihash, encoded with base58btc and hash it with the hashing functions used in the denylist. Match against declared double-hashes. +- Otherwise, assume NAME is a domain name: Hash `/ipns/NAME` with the hashing functions used in the denylist. Match against declared double-hashes. + The BlockService should: From 0f8e9f1d84051f3f8f8c302e318e170d108b565c Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Tue, 3 Oct 2023 16:59:54 +0200 Subject: [PATCH 21/32] Linter errors --- COMPACT_DENYLIST_FORMAT.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index 8ab271957..1e6ad7cc5 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -240,7 +240,6 @@ should attempt parsing the header as YAML: that was too large will be parsed line by line as block items and error accordingly line per line, without causing excessive resource allocation. - #### Hints A *hint* is a key-value duple associated to a \. the denylist as @@ -272,7 +271,6 @@ Implementations must decide what to do when processing a denylist and an invalid - Abort parsing and return a general error OR - Continue processing the list, discarding unrecognized rules - ##### `/ipfs/CID` CID-rule: Blocks a specific multihash. If the CID is a V1, it blocks the From 2a702c33b8f7124c5c99b97d3ed71ef108308799 Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Tue, 3 Oct 2023 17:02:31 +0200 Subject: [PATCH 22/32] Linter errors --- COMPACT_DENYLIST_FORMAT.md | 1 - 1 file changed, 1 deletion(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index 1e6ad7cc5..f955ec7a1 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -402,7 +402,6 @@ The NameSystem should: - If NAME is a CID (try parsing as CID first), extract the multihash, encoded with base58btc and hash it with the hashing functions used in the denylist. Match against declared double-hashes. - Otherwise, assume NAME is a domain name: Hash `/ipns/NAME` with the hashing functions used in the denylist. Match against declared double-hashes. - The BlockService should: - Convert the CID to `CIDV1BASE32/` (keeping the CID codec and adding a slash at the end) and hash it with the hashing functions used in the denylist. Match against declared double-hashes. From 068383cb5e55f2eac7b820213422d48e6d033760 Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Mon, 23 Oct 2023 12:27:14 +0200 Subject: [PATCH 23/32] Make clarifying notes about `/ipfs/CID/*`. --- COMPACT_DENYLIST_FORMAT.md | 29 +++++++++++++++++++++-------- 1 file changed, 21 insertions(+), 8 deletions(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index f955ec7a1..92d5af49f 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -133,13 +133,11 @@ hints: hint2: value2 --- # Blocking by CID - blocks wrapped multihash. -# Does not block subpaths. +# Does not block subpaths per se, but might stop an implementation +# from resolving subpaths if this block is not retrievable. /ipfs/bafybeihvvulpp4evxj7x7armbqcyg6uezzuig6jp3lktpbovlqfkuqeuoq -# Block all subpaths -/ipfs/QmdWFA9FL52hx3j9EJZPQP1ZUH8Ygi5tLCX2cRDs6knSf8/* - -# Block some subpaths (equivalent rules) +# Blocking by subpath (equivalent rules) /ipfs/Qmah2YDTfrox4watLCr3YgKyBwvjq8FJZEFdWY6WtJ3Xt2/test* /ipfs/QmTuvSQbEDR3sarFAN9kAeXBpiBCyYYNxdxciazBba11eC/test/* @@ -280,7 +278,11 @@ When users want to block by multihash directly, they must base58btc-encoded multihashes. This rule does not block subpaths that start at this CID, only the CID itself. -Blocking layer recommendation: BlockService. +Blocking layer recommendation: BlockService (or PathResolver if wanting to +block by path only). + +**NOTE**: See note in `/ipfs/CID/*` below, as to why this rule may effectively + block all subpaths too. ##### `/ipfs/CID/PATH` @@ -298,11 +300,22 @@ IPFS-Path-Prefix-Rule: Blocks any multihash-path combination starting with the the given path prefix. `/*` includes the empty path. Thus `/ipfs/CID/*` blocks the CID itself, and any paths. Examples: -- `/ipfs/CID/*` : blocks CID (by multihash) and any path before resolving. +- `/ipfs/CID/*` : blocks CID (by multihash) and any path BEFORE resolving. - `/ipfs/CID/ab*`: blocks any path derived from the CID (multihash) and starting with "ab", including "ab" - `/ipfs/CID/ab/*`: equivalent to the above. -Blocking layer recommendation: PathResolver + (BlockService if the CID itself is blocked too). +Blocking layer recommendation: PathResolver + +**NOTE**: When the rule `/ipfs/CID` exists and BlockService-level blocking + exists, subpaths of CID will effectively be blocked in the process of being + resolved, as we would disallow fetching the root CID, even if the subpath + itself is not block. This causes `/ipfs/CID` to behave like + `/ipfs/CID/*`. In cases where all requests go through the PathResolver, + blocking at the BlockService could be disabled. In that case fetching + `/ipfs/CID` would be allowed even if that rule existed, when the process is + part of the resolution of a subpath that is not blocked. Implementations can + decide which model they want to adopt. + ##### `/ipns/IPNS` From 675254068c729a5371116ffb451f721f8e0a40ae Mon Sep 17 00:00:00 2001 From: Hector Sanjuan Date: Mon, 23 Oct 2023 13:00:38 +0200 Subject: [PATCH 24/32] Do not equate multihash to CIDv0 in double-hash spec --- COMPACT_DENYLIST_FORMAT.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/COMPACT_DENYLIST_FORMAT.md b/COMPACT_DENYLIST_FORMAT.md index 92d5af49f..16f24807a 100644 --- a/COMPACT_DENYLIST_FORMAT.md +++ b/COMPACT_DENYLIST_FORMAT.md @@ -362,7 +362,7 @@ Doublehash-Rule: Blocks using double-hashed item, which can be: badbits block anchor format. It can only block by CID and not by multihash. When no path present, the trailing slash must be kept (`CIDV1_BASE32/`). -- A b58-encoded multihash (a.k.a CIDV0), corresponding to the Sum() of: +- A b58-encoded multihash, corresponding to the Sum() of: - An IPNS-Path: - `/ipns/NAME` when the IPNS name is NOT a CID. - The b58-encoded-multihash extracted from an IPNS name when the IPNS name From 1f8814575bc4e2c5e51f32fd55521600a0c54087 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Wed, 25 Oct 2023 23:36:44 +0200 Subject: [PATCH 25/32] chore: move denylist to website src --- COMPACT_DENYLIST_FORMAT.md => src/compact-denylist-format.md | 0 IPIP/383-compact-denylist-format.md => src/ipips/ipip-0383.md | 0 2 files changed, 0 insertions(+), 0 deletions(-) rename COMPACT_DENYLIST_FORMAT.md => src/compact-denylist-format.md (100%) rename IPIP/383-compact-denylist-format.md => src/ipips/ipip-0383.md (100%) diff --git a/COMPACT_DENYLIST_FORMAT.md b/src/compact-denylist-format.md similarity index 100% rename from COMPACT_DENYLIST_FORMAT.md rename to src/compact-denylist-format.md diff --git a/IPIP/383-compact-denylist-format.md b/src/ipips/ipip-0383.md similarity index 100% rename from IPIP/383-compact-denylist-format.md rename to src/ipips/ipip-0383.md From 79bfdc75da3122031513902c5d78aa55b2b61308 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Thu, 26 Oct 2023 01:19:37 +0200 Subject: [PATCH 26/32] ipip-383: publish the spec on the website --- src/compact-denylist-format.md | 121 +++++++++++++++++---------------- src/index.html | 11 ++- src/ipips/ipip-0383.md | 37 ++++++---- 3 files changed, 97 insertions(+), 72 deletions(-) diff --git a/src/compact-denylist-format.md b/src/compact-denylist-format.md index 16f24807a..d7625660b 100644 --- a/src/compact-denylist-format.md +++ b/src/compact-denylist-format.md @@ -1,32 +1,21 @@ -# Compact Denylist Format specification - -![wip](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) - -**Author(s)**: -- @hsanjuan - -**Maintainer(s)**: -- @hsanjuan - -* * * - -**Abstract** - -This is the specification for [compact denlylist format V1](IPIP/383-compact-denylist-format.md). +--- +title: Compact Denylist Format +description: > + How content blocking rules can be represented as a .deny file. +date: 2023-10-25 +maturity: reliable +editors: + - name: Hector Sanjuan + github: hsanjuan + affiliation: + name: Protocol Labs + url: https://protocol.ai/ +tags: ['filtering'] +order: 1 +--- Denylists provide a way to indicate what content should be blocked by IPFS. -## Organization of this document - -- [Introduction](#introduction) -- [Specification](#specification) - - [Denylist File extension, locations and order](#denylist-file-extension-locations-and-order) - - [Denylist format](#denylist-format) - - [Test fixtures](#test-fixtures) - - [Security](#security) - - [Privacy and User Control](#privacy-and-user-control) -- [Implementations](#implementations) - ## Introduction A denylist is a collection of items that will be "blocked" on IPFS software. @@ -55,8 +44,8 @@ done in the future to support specific features. The denylist itself, after the header, is a collection of **block items** and block-item-specific hints. There are different flavours of block items, depending on whether we are blocking by CID, CID+path, Path, IPNS, using -double-hashing etc. but the idea is that whether an item is blocked or not can -be decided directly and ideally, prior to retrieval. +double-hashing etc. but the idea is that whether an item is blocked or not +SHOULD be decided directly and ideally, _prior to retrieval_. We include *negative block items* as well, with the idea of enabling denylists that are append-only. One of the main operational constraints we have seen is @@ -111,9 +100,9 @@ in future versions. While not pertaining to the denylist format itself, we introduce the following conventions about denylist files when they are stored in the local filesystem: -- Denylist files are named with the extension `.deny`. -- Implementations should look in `/etc/ipfs/denylists/` and - `$XDG_CONFIG_HOME/ipfs/denylists/` for denylist files. +- Denylist files MUST be named with the extension `.deny`. +- Implementations SHOULD look in `/etc/ipfs/denylists/` and + `$XDG_CONFIG_HOME/ipfs/denylists/` (default: `~/.config/ipfs/denylists`) for denylist files. - Denylist files are processed in alphabetical order so that rules from later denylists override rules from earlier denylists on conflict. @@ -123,7 +112,7 @@ While not pertaining to the denylist format itself, we introduce the following c The following example showcases the features and syntax of a compact denylist: -``` +```yaml version: 1 name: IPFSCorp blocking list description: A collection of bad things we have found in the universe @@ -132,7 +121,7 @@ hints: hint: value hint2: value2 --- -# Blocking by CID - blocks wrapped multihash. +# Blocking by CID is codec-agnostic (blocks by multihash). # Does not block subpaths per se, but might stop an implementation # from resolving subpaths if this block is not retrievable. /ipfs/bafybeihvvulpp4evxj7x7armbqcyg6uezzuig6jp3lktpbovlqfkuqeuoq @@ -160,42 +149,43 @@ hints: /mime/image/* !/mime/image/gif -# Legacy CID double-hash block -# sha256(bafybeiefwqslmf6zyyrxodaxx4vwqircuxpza5ri45ws3y5a62ypxti42e/) -# blocks only this CID -//d9d295bde21f422d471a90f2a37ec53049fdf3e5fa3ee2e8f20e10003da429e7 - -# Legacy Path double-hash block -# Blocks bafybeiefwqslmf6zyyrxodaxx4vwqircuxpza5ri45ws3y5a62ypxti42e/path -# but not any other paths. -//3f8b9febd851873b3774b937cce126910699ceac56e72e64b866f8e258d09572 - -# Double hash CID block +# Double-hash CID block # base58btc-sha256-multihash(QmVTF1yEejXd9iMgoRTFDxBv7HAz9kuZcQNBzHrceuK9HR) # Blocks bafybeidjwik6im54nrpfg7osdvmx7zojl5oaxqel5cmsz46iuelwf5acja # and QmVTF1yEejXd9iMgoRTFDxBv7HAz9kuZcQNBzHrceuK9HR etc. by multihash //QmX9dhRcQcKUw3Ws8485T5a9dtjrSCQaUAHnG4iK9i4ceM -# Double hash Path block using blake3 hashing +# Double-hash Path block using blake3 hashing # base58btc-blake3-multihash(gW7Nhu4HrfDtphEivm3Z9NNE7gpdh5Tga8g6JNZc1S8E47/path) # Blocks /ipfs/bafyb4ieqht3b2rssdmc7sjv2cy2gfdilxkfh7623nvndziyqnawkmo266a/path # /ipfs/bafyb4ieqht3b2rssdmc7sjv2cy2gfdilxkfh7623nvndziyqnawkmo266a/path # /ipfs/f01701e20903cf61d46521b05f926ba1634628d0bba8a7ffb5b6d5a3ca310682ca63b5ef0/path etc... # But not /path2 //QmbK7LDv5NNBvYQzNfm2eED17SNLt1yNMapcUhSuNLgkqz + +# Legacy CID double-hash block +# sha256(bafybeiefwqslmf6zyyrxodaxx4vwqircuxpza5ri45ws3y5a62ypxti42e/) +# blocks only this CID +//d9d295bde21f422d471a90f2a37ec53049fdf3e5fa3ee2e8f20e10003da429e7 + +# Legacy Path double-hash block +# Blocks bafybeiefwqslmf6zyyrxodaxx4vwqircuxpza5ri45ws3y5a62ypxti42e/path +# but not any other paths. +//3f8b9febd851873b3774b937cce126910699ceac56e72e64b866f8e258d09572 + ``` #### High level list format A denylist is a UTF-8 encoded text file made of an optional header and a list of blockitems separated by newlines (`\n`). Comment lines start with `#`. Empty lines are allowed. -``` -
+```yaml +[yaml_header] --- - [hint_list] +[block_item1] [optional_hint_list] # comment - [hint_list] +[block_item2] [optional_hint_list] ... ``` @@ -240,13 +230,13 @@ should attempt parsing the header as YAML: #### Hints -A *hint* is a key-value duple associated to a \. the denylist as -a whole (part of the header), or to a specific \. +A *hint* is a key-value duple associated to a `[block_item]`. the denylist as +a whole (part of the header), or to a specific `[block_item]`. -A list of hints can optionally follow every \ as show +A list of hints can optionally follow every `[block_item]` as show above. Hints can also be specified in the denylist header ("header hints"). This is equivalent to adding the same hints to every single -\ in the denylist. Implementations should associate both the +`[block_item]` in the denylist. Implementations should associate both the specific and the header hints to every block rule. #### List body @@ -281,8 +271,11 @@ the CID itself. Blocking layer recommendation: BlockService (or PathResolver if wanting to block by path only). -**NOTE**: See note in `/ipfs/CID/*` below, as to why this rule may effectively - block all subpaths too. +:::warning + +See note in `/ipfs/CID/*` below, as to why this rule may effectively block all subpaths too. + +::: ##### `/ipfs/CID/PATH` @@ -306,7 +299,9 @@ blocks the CID itself, and any paths. Examples: Blocking layer recommendation: PathResolver -**NOTE**: When the rule `/ipfs/CID` exists and BlockService-level blocking +:::warning + +When the rule `/ipfs/CID` exists and BlockService-level blocking exists, subpaths of CID will effectively be blocked in the process of being resolved, as we would disallow fetching the root CID, even if the subpath itself is not block. This causes `/ipfs/CID` to behave like @@ -316,6 +311,7 @@ Blocking layer recommendation: PathResolver part of the resolution of a subpath that is not blocked. Implementations can decide which model they want to adopt. +::: ##### `/ipns/IPNS` @@ -423,6 +419,8 @@ The BlockService should: #### Allow (or negated) rules +The specification syntax examples describe a `.deny` list of items to block (deny). + Block items can be prepended by `!`, which means that items matching the rule are to be allowed rather than blocked. This can be used to undo existing rules, but also to add concrete exceptions to wider rules. Order matters, and Allow rules must come AFTER other existing rules. @@ -446,12 +444,19 @@ Examples: In this example, `/ipns/my.domain` stays blocked because the deny rule happens AFTER the allow one. +:::note + +Implementations MAY reuse denylist format for `.allow` files, where everything +is blocked by default, and only matching items are allowed. + +::: + #### Hint list A hint list is an optional space-separated list of hints associated with specific block items in the form: ``` - hintA:v1 hintB:v2 hintC:v3 +[block_item] hintA:v1 hintB:v2 hintC:v3 ``` Block items and hints are separated by one or more consecutive instances of @@ -464,7 +469,7 @@ Denylist parsing and correct behaviour can be tested using the denylist, which provides example rules and describes the expected behaviour in detail. -In particular, a [Blocker implementation validator](https://github.com/ipfs-shipyard/nopfs/tree/master/tester) is provided in Go, and can be adapted to other languages if needed. +In particular, a reference [Blocker implementation validator](https://github.com/ipfs-shipyard/nopfs/tree/master/tester) is provided in Go, and can be adapted to other languages if needed. ### Security diff --git a/src/index.html b/src/index.html index 761ca5585..7bce1bf3b 100644 --- a/src/index.html +++ b/src/index.html @@ -88,7 +88,7 @@

HTTP Gateways

{% include 'list.html', posts: collections.httpGateways %}
-

InterPlanetary Naming System

+

IPNS

The InterPlanetary Naming System (IPNS) is a naming system responsible for creating, reading and updating mutable pointers to data.

@@ -98,10 +98,17 @@

InterPlanetary Naming System

Routing

Content routing is the way to determine where to find a given CID on the network; - specifically, which network peers provide specific CIDs. + specifically, which network peers provide specific CIDs.

{% include 'list.html', posts: collections.routing %}
+
+

Content Filtering

+

+ How IPFS service operators can control the content hosted on their nodes. +

+ {% include 'list.html', posts: collections.filtering %} +

InterPlanetary Improvement Proposals

diff --git a/src/ipips/ipip-0383.md b/src/ipips/ipip-0383.md index 0db768d6b..29a1e2ab4 100644 --- a/src/ipips/ipip-0383.md +++ b/src/ipips/ipip-0383.md @@ -1,15 +1,28 @@ -# IPIP-383: Compact denylist format - -- Start Date: 2023-03-09 -- Related Issues: - - A different proposal: https://github.com/ipfs/specs/pull/340 +--- +title: "IPIP-0383: Compact Denylist Format" +date: 2023-03-09 +ipip: proposal +editors: + - name: Hector Sanjuan + github: hsanjuan + affiliation: + name: Protocol Labs + url: https://protocol.ai/ +relatedIssues: + - https://github.com/ipfs/specs/issues/298 + - https://github.com/ipfs/specs/pull/299 + - https://github.com/ipfs/specs/pull/340 +order: 383 +tags: ['ipips'] +--- ## Summary This IPIP introduces a line-based denylist format for content blocking on IPFS focused on simplicity, scalability and ease-of-use. -A reference Go implementation of a denylist parser and Blocker component for the Go-IPFS stack exists at https://github.com/ipfs-shipyard/nopfs. +A reference Go implementation of a denylist parser and Blocker component for +the Kubo (go-ipfs) stack exists at https://github.com/ipfs-shipyard/nopfs. ## Motivation @@ -23,7 +36,7 @@ can rely on and share. ## Detailed design -See [Compact Denylist Format](../COMPACT_DENYLIST_FORMAT.md). +See :cite[compact-denylist-format]. ## Design rationale @@ -51,7 +64,7 @@ The proposed design is part of a holistic approach to content-moderation for IPF - Ability to block based on CID codec (only allow Codec X) - Ability to block based on multihash function (”no identity multihashes”) - Ability to block IPNS names - + - Regarding the lists: - Compact format, compression friendly - Line-based so that updates can be watched @@ -66,7 +79,7 @@ The proposed design is part of a holistic approach to content-moderation for IPF - Lists support gateway http error hints (i.e. type of block) - `echo "/ipfs/cid" >> ~/.config/ipfs/denylists/custom` should work - Lists have a header section with information about the list. - + - Regarding the implementation: - Multiple denylists should be supported - Hot-reloading of list (no restart of IPFS required) @@ -96,9 +109,9 @@ Users and developers will benefit from a list format that is easy to work with b ### Compatibility -The current Protocol Labs denylist format -https://badbits.dwebops.pub/denylist.json can be easily converted into the -proposed compact format. This is shown at https://badbits.dwebops.pub/badbits.deny. +The old JSON-based Protocol Labs denylist format +[https://badbits.dwebops.pub/denylist.json](https://web.archive.org/web/20230610082307/https://badbits.dwebops.pub/denylist.json) can be easily converted into the +proposed compact format. This is shown at . ### Alternatives From ffa81ee09b0656cb24ca4097807720ae71f63fd9 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Thu, 26 Oct 2023 01:27:03 +0200 Subject: [PATCH 27/32] chore: fix markdown identation --- src/ipips/ipip-0383.md | 70 ++++++++++++++++++++---------------------- 1 file changed, 34 insertions(+), 36 deletions(-) diff --git a/src/ipips/ipip-0383.md b/src/ipips/ipip-0383.md index 29a1e2ab4..59084c8dd 100644 --- a/src/ipips/ipip-0383.md +++ b/src/ipips/ipip-0383.md @@ -33,7 +33,6 @@ The first step in a larger strategy to enable decentralized content moderation in IPFS setups is to agree in a denylist format that different implementations can rely on and share. - ## Detailed design See :cite[compact-denylist-format]. @@ -57,46 +56,46 @@ following aspects, which are a must for such a system: The proposed design is part of a holistic approach to content-moderation for IPFS for which we have the following detailed wishlist of items ultimately related to the denylist format: - Regarding the type of blocking: - - Ability to block content from being retrieved, stored or served by multihash - - Ability to block content that is referenced with an IPFS-path from a blocked multihash or traversing a blocked multihash. - - Ability to block by regexp-matching an IPFS path - - Ability to block based on content-type (i.e. only store/serve plain-text,and pictures) - - Ability to block based on CID codec (only allow Codec X) - - Ability to block based on multihash function (”no identity multihashes”) - - Ability to block IPNS names + - Ability to block content from being retrieved, stored or served by multihash + - Ability to block content that is referenced with an IPFS-path from a blocked multihash or traversing a blocked multihash. + - Ability to block by regexp-matching an IPFS path + - Ability to block based on content-type (i.e. only store/serve plain-text,and pictures) + - Ability to block based on CID codec (only allow Codec X) + - Ability to block based on multihash function (”no identity multihashes”) + - Ability to block IPNS names - Regarding the lists: - - Compact format, compression friendly - - Line-based so that updates can be watched - - Lists support CIDs - - Lists support CIDs+path (explicit) - - Lists support CIDs+path (implicit - everything referenced from CID) - - Lists support double-hashed multi-hashes - - Lists support double-hashed cid+path (current badbits format) - - Lists can be edited by hand on a text editor - - Lists are ipfs-replication-friendly (adding a new entry does not require downloading more than 1 IPFS block, to sync the list). - - Lists support comments - - Lists support gateway http error hints (i.e. type of block) - - `echo "/ipfs/cid" >> ~/.config/ipfs/denylists/custom` should work - - Lists have a header section with information about the list. + - Compact format, compression friendly + - Line-based so that updates can be watched + - Lists support CIDs + - Lists support CIDs+path (explicit) + - Lists support CIDs+path (implicit - everything referenced from CID) + - Lists support double-hashed multi-hashes + - Lists support double-hashed cid+path (current badbits format) + - Lists can be edited by hand on a text editor + - Lists are ipfs-replication-friendly (adding a new entry does not require downloading more than 1 IPFS block, to sync the list). + - Lists support comments + - Lists support gateway http error hints (i.e. type of block) + - `echo "/ipfs/cid" >> ~/.config/ipfs/denylists/custom` should work + - Lists have a header section with information about the list. - Regarding the implementation: - - Multiple denylists should be supported - - Hot-reloading of list (no restart of IPFS required) - - List removal does not require restart - - Minimal introduction of latency - - Minimal memory footprint (i.e. only read minimum amount of data into memory) - - Clean denylist module entrypoints (easy integration in current ipfs stack layers) - - Portable architecture (to other IPFS implementations). i.e. good interfaces to switch from an embedded implementation to something that could run separately, or embedded in other languages (i.e. even servicing multiple ipfs daemons). - - Text-based API. `ipfs deny ` and the like are nice-to-have but not a must to work with denylists. - - Security in mind: do not enable amplification attacks through lists (i.e. someone requesting a recursively blocked CID repeteadly over the gateway endpoint causes traversal of the whole CID-DAG. + - Multiple denylists should be supported + - Hot-reloading of list (no restart of IPFS required) + - List removal does not require restart + - Minimal introduction of latency + - Minimal memory footprint (i.e. only read minimum amount of data into memory) + - Clean denylist module entrypoints (easy integration in current ipfs stack layers) + - Portable architecture (to other IPFS implementations). i.e. good interfaces to switch from an embedded implementation to something that could run separately, or embedded in other languages (i.e. even servicing multiple ipfs daemons). + - Text-based API. `ipfs deny ` and the like are nice-to-have but not a must to work with denylists. + - Security in mind: do not enable amplification attacks through lists (i.e. someone requesting a recursively blocked CID repeteadly over the gateway endpoint causes traversal of the whole CID-DAG. - Regarding list distribution: - - Ability to subscribe to multiple lists, and fetch any updates as they happen - - Ability to publish own lists so that others can subscribe to them - - List-subscription configuration or file details remote lists that the user is subscribed to. Editable by hand. - - Ability to subscribe to list subscriptions. - - List subscriptions can carry context (i.e. publisher, email, type of blocking. + - Ability to subscribe to multiple lists, and fetch any updates as they happen + - Ability to publish own lists so that others can subscribe to them + - List-subscription configuration or file details remote lists that the user is subscribed to. Editable by hand. + - Ability to subscribe to list subscriptions. + - List subscriptions can carry context (i.e. publisher, email, type of blocking. ### User benefit @@ -113,7 +112,6 @@ The old JSON-based Protocol Labs denylist format [https://badbits.dwebops.pub/denylist.json](https://web.archive.org/web/20230610082307/https://badbits.dwebops.pub/denylist.json) can be easily converted into the proposed compact format. This is shown at . - ### Alternatives This proposal is a follow up to a [previous proposal](https://github.com/ipfs/specs/pull/340), which has several shortcomings that make it not very practical when working at scale. Both list formats can co-exist though but ultimately it will be a matter of implementation support, and it would be better to settle on one thing. From 7233af26ec977fe77e2bd10a07c83cbe25ff519e Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Thu, 26 Oct 2023 01:44:02 +0200 Subject: [PATCH 28/32] ipip-383: remove mime rules Ref. https://github.com/ipfs/specs/pull/383#discussion_r1343792830 --- src/compact-denylist-format.md | 6 ------ 1 file changed, 6 deletions(-) diff --git a/src/compact-denylist-format.md b/src/compact-denylist-format.md index d7625660b..a3e875c24 100644 --- a/src/compact-denylist-format.md +++ b/src/compact-denylist-format.md @@ -145,10 +145,6 @@ hints: # Block IPNS key - blocks wrapped multihash. /ipns/k51qzi5uqu5dhmzyv3zac033i7rl9hkgczxyl81lwoukda2htteop7d3x0y1mf -# Block all mime types with exceptions -/mime/image/* -!/mime/image/gif - # Double-hash CID block # base58btc-sha256-multihash(QmVTF1yEejXd9iMgoRTFDxBv7HAz9kuZcQNBzHrceuK9HR) # Blocks bafybeidjwik6im54nrpfg7osdvmx7zojl5oaxqel5cmsz46iuelwf5acja @@ -435,8 +431,6 @@ Examples: ``` /ipfs/QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768/photo* !/ipfs/QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768/photo123.jpg -/mime/* -!/mime/text/plain !/ipns/my.domain /ipns/my.domain ``` From a5d850357c90757a3ab320a984e39ad41a9c7eb2 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Thu, 26 Oct 2023 01:49:38 +0200 Subject: [PATCH 29/32] chore: markdown lint MD049/emphasis-style --- src/compact-denylist-format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/compact-denylist-format.md b/src/compact-denylist-format.md index a3e875c24..67c0ddd50 100644 --- a/src/compact-denylist-format.md +++ b/src/compact-denylist-format.md @@ -45,7 +45,7 @@ The denylist itself, after the header, is a collection of **block items** and block-item-specific hints. There are different flavours of block items, depending on whether we are blocking by CID, CID+path, Path, IPNS, using double-hashing etc. but the idea is that whether an item is blocked or not -SHOULD be decided directly and ideally, _prior to retrieval_. +SHOULD be decided directly and ideally, *prior to retrieval*. We include *negative block items* as well, with the idea of enabling denylists that are append-only. One of the main operational constraints we have seen is From fb7304ad305c317bdb9cc571dfc1c9be4c3497ba Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Thu, 26 Oct 2023 23:38:37 +0200 Subject: [PATCH 30/32] ipip-383: deduplicate content, editorial cleanup improves the flow of the document, hints are described only once now (3x before) in a single place, table of contents is easy to glance at and follows the flow of the file syntax --- src/compact-denylist-format.md | 291 ++++++++++++++++++--------------- 1 file changed, 160 insertions(+), 131 deletions(-) diff --git a/src/compact-denylist-format.md b/src/compact-denylist-format.md index 67c0ddd50..c5714c925 100644 --- a/src/compact-denylist-format.md +++ b/src/compact-denylist-format.md @@ -10,11 +10,18 @@ editors: affiliation: name: Protocol Labs url: https://protocol.ai/ + - name: Marcin Rataj + github: lidel + url: https://lidel.org/ + affiliation: + name: Protocol Labs + url: https://protocol.ai/ tags: ['filtering'] order: 1 --- -Denylists provide a way to indicate what content should be blocked by IPFS. +Denylists provide technical means for IPFS service operators to control +the content hosted on their nodes. ## Introduction @@ -94,23 +101,9 @@ limited the number of features and extensions to a minimum to start working with, leaving some ideas on the table and the door open to develop the format in future versions. -## Specification - -### Denylist file extension, locations and order - -While not pertaining to the denylist format itself, we introduce the following conventions about denylist files when they are stored in the local filesystem: - -- Denylist files MUST be named with the extension `.deny`. -- Implementations SHOULD look in `/etc/ipfs/denylists/` and - `$XDG_CONFIG_HOME/ipfs/denylists/` (default: `~/.config/ipfs/denylists`) for denylist files. -- Denylist files are processed in alphabetical order so that rules from later - denylists override rules from earlier denylists on conflict. - -### Denylist format - -#### Summary +## Example -The following example showcases the features and syntax of a compact denylist: +The following example showcases the features and [syntax](#file-syntax) of a compact denylist: ```yaml version: 1 @@ -171,9 +164,13 @@ hints: ``` -#### High level list format +## File syntax + +A denylist is a UTF-8 encoded text file made of an optional [header](#header) +terminated with `---` and a list of [block items](#block-item) separated by +newlines (`\n`). Block items can have optional [hints](#hints). -A denylist is a UTF-8 encoded text file made of an optional header and a list of blockitems separated by newlines (`\n`). Comment lines start with `#`. Empty lines are allowed. +Comment lines start with `#`. Empty lines are allowed. ```yaml [yaml_header] @@ -185,69 +182,59 @@ A denylist is a UTF-8 encoded text file made of an optional header and a list of ... ``` -#### Header +Lines should not be longer than 2MiB including the "\n" delimiter. + -The list header is a YAML block: +### Header + +The list header is an optional YAML block. - Must be valid YAML - Fully optional - 1 MiB (1048576 bytes) maximum size - Delimited by a line containing `---` at the end (document separator) +- Field names are case-sensitive -Known-fields (they must be lowercase): - -- `version`: the denylist format version. Defaults to 1 when not - specified. Implementations should reject parsing denylist versions that they - do not support. +Known fields: +- `version` + - The denylist format version. Defaults to 1 when not specified. + - Implementations SHOULD reject parsing denylist versions that they do not + support. - `name` - `description` - `author` -- `hints`: a map of *hints*. See below. - -The known fields-list may be expanded as this specification is developed. To -include custom information in the header, we recommend using custom fields or hints. - -Custom fields: - -- Fields starting with `x-` or `X-` are considered custom. Users can freely - include them in the header and implementations can support them as needed. -- Custom fields are not a property of each block item as "header hints" are - considered to be. - -In order to parse the header, implementations should read the denylist until a -`---` is found or the 1MiB limit is reached. If the `---` is found, they -should attempt parsing the header as YAML: - -- If parsing the header fails, they should abort and signal an error. -- If the size limit is reached, they should assume the list includes no header - and start parsing block items from the beginning of the denylist. A header - that was too large will be parsed line by line as block items and error - accordingly line per line, without causing excessive resource allocation. - -#### Hints - -A *hint* is a key-value duple associated to a `[block_item]`. the denylist as -a whole (part of the header), or to a specific `[block_item]`. - -A list of hints can optionally follow every `[block_item]` as show -above. Hints can also be specified in the denylist header ("header -hints"). This is equivalent to adding the same hints to every single -`[block_item]` in the denylist. Implementations should associate both the -specific and the header hints to every block rule. - -#### List body - -A denylist is made of lines which are made by a *block items* followed by zero or more space-separated hints. - -Lines should not be longer than 2MiB including the "\n" delimiter. - -#### Block item +- `hints` + - A map of optional global [hints](#hints). When present, SHOULD be applied + to every [block item](#block-item) on the list before applying per item + ones (if any). + +The list of known fields may be expanded in the future. Fields with names not +listed above are considered custom. List creators can freely include custom +fields in the header and implementations can support them as needed. +Implementations SHOULD ignore unknown header fields to ensure custom fields do +not impact parsing of the list. + +In order to parse the YAML header, implementations MUST: +1. Read the denylist until a `---` is found or the 1MiB limit is reached. + - If the size limit is reached, assume the list includes no + header and start parsing block items from the beginning of the denylist. A + header that was too large will be parsed line by line as block items and + error accordingly line per line, without causing excessive resource + allocation. +2. If the `---` is found, attempt parsing the header as YAML. + - If parsing the header fails, abort and signal an error. + +### Block item A block item represents a rule to enable content-blocking: - `PATH` elements are expected to be %-encoded, per [RFC 3986, section 2.1](https://developer.mozilla.org/en-US/docs/Glossary/percent-encoding). - `CID` elements represent a CID (either V0 or V1). -- `CIDv0` are for us equivalent to baseb58btc-encoded sha256 multihashes although they are not the same thing (a CIDV0 carries implicit codec (dag-pb) and multibase information (b58btc). When we say a b58-encoded multihash needs to be extracted from the CID, this usually is a no-op in case of CIDv0s. + - Legacy CIDv0 are equivalent to baseb58btc-encoded sha256 multihashes + although they are not the same thing. A CIDv0 carries implicit codec + (dag-pb) and multibase information (b58btc). When we say a b58-encoded + multihash needs to be extracted from the CID, this is a no-op in + case of CIDv0s. Implementations must decide what to do when processing a denylist and an invalid block-item rule is found: @@ -255,7 +242,7 @@ Implementations must decide what to do when processing a denylist and an invalid - Abort parsing and return a general error OR - Continue processing the list, discarding unrecognized rules -##### `/ipfs/CID` +#### `/ipfs/CID` CID-rule: Blocks a specific multihash. If the CID is a V1, it blocks the multihash contained in it (CIDv0s are multihashes already). @@ -273,7 +260,7 @@ See note in `/ipfs/CID/*` below, as to why this rule may effectively block all s ::: -##### `/ipfs/CID/PATH` +#### `/ipfs/CID/PATH` IPFS-Path-Rule: Blocks the exact ipfs path that is referenced from the multihash embedded in the CID before attempting to resolve it. It does not block @@ -283,7 +270,7 @@ Note `/ipfs/CID/path` and `/ipfs/CID/path/` are equivalent rules. Blocking layer recommendation: PathResolver. -##### `/ipfs/CID/*` `/ipfs/CID/P/A/T/H*` +#### `/ipfs/CID/PATH*` IPFS-Path-Prefix-Rule: Blocks any multihash-path combination starting with the the given path prefix. `/*` includes the empty path. Thus `/ipfs/CID/*` @@ -309,19 +296,19 @@ When the rule `/ipfs/CID` exists and BlockService-level blocking ::: -##### `/ipns/IPNS` +#### `/ipns/NAME` IPNS-rule: Blocks the given IPNS name before resolving. It does not block the CID that it resolves to. -If the IPNS name is a domain name, it is blocked directy. +If the IPNS `NAME` is a domain name, it is blocked directy. -If the IPNS name is a CIDv1 (libp2p-key) or b58-encoded-multihash (CIDV0), +If the IPNS `NAME` is a CIDv1 (libp2p-key) or b58-encoded-multihash (CIDV0), then the blocking affects the underlying Multihash. Blocking layer recommendation: NameSystem. -##### `/ipns/IPNS/PATH` +#### `/ipns/NAME/PATH` IPNS-Path-rule: Blocks specifically the IPNS path, before resolving. Equivalent to `/ipfs/CID/PATH`. @@ -329,7 +316,7 @@ Blocking layer recommendation: There is no good place to implement this rule as the NameSystem only handles IPNS names (without paths), and the path.Resolver only handles already-resolved Paths. -##### `/ipns/NAME/*` `/ipns/NAME/PATH*` +#### `/ipns/NAME/PATH*` IPNS-Path-Prefix-Rule: Same as with the IPFS-Path-Prefix-Rule. @@ -337,7 +324,10 @@ Blocking layer recommendation: There is no good place to implement this rule as the NameSystem only handles IPNS names (without paths), and the path.Resolver only handles already-resolved Paths. -##### `/PATH` `/PATH/*` `/PATH*` + + +#### `//DOUBLE-HASH` Doublehash-Rule: Blocks using double-hashed item, which can be: -- The sha256-hex-encoded hash of `CIDV1_BASE32/PATH`: this is the legacy - badbits block anchor format. It can only block by CID and not by - multihash. When no path present, the trailing slash must be kept - (`CIDV1_BASE32/`). -- A b58-encoded multihash, corresponding to the Sum() of: +- (modern) a base58btc-encoded multihash, corresponding to the hash of either: + - An IPFS-Path: `b58-encoded-multihash/P/A/T/H` where the multihash is + extracted from the CID in `/ipfs/CID/P/A/T/H` + - CIDv1 needs to be converted to a raw Multihash in b58 mutlibase. CIDv0 is + already a valid b58 Multihash and required no conversion. + - The `/P/A/T/H` component is optional and should not have a trailing `/`. - An IPNS-Path: - `/ipns/NAME` when the IPNS name is NOT a CID. - The b58-encoded-multihash extracted from an IPNS name when the IPNS name is a CID. - - An IPFS-Path: `b58-encoded-multihash/P/A/T/H` where the multihash is - extracted from the CID in `/ipfs/CID/P/A/T/H` (The multihash and the CID - are the same in the case of CIDV0). The `/P/A/T/H` component is optional - and should not have a trailing `/`. + - The modern Multihash form allows blocking by double-hash using any hashing + function of choice. Implementations will have to hash requests using all + the hashing functions used in the denylist, so we recommend sticking to + one. +- (legacy) the sha256-hex-encoded hash of `CIDV1_BASE32/PATH` + - This is the legacy badbits block anchor format used before this + specification was created. + - When no path is present, the trailing slash must be kept (`CIDV1_BASE32/`). + - It can only block by CID and not by multihash, and is tied to sha256 hash + function, which makes is inferior to the modern and more future-proof + b58-encoded multihash notation which supports use of alternative hash + functions. + +In a case where implementation cannot distinguish a double-hashed rule between +a b58btc multihash (modern) and a sha256 hex-string (legacy), content blocking +system MUST create deny rules for both. + +Content filtering of double-hashed entries SHOULD be applied in every logical +system acting as NameSystem, PathResolver or BlockService. + +In order to check for a matching rule, the PathResolver working with `/ipfs/CID/PATH` should: + +- (modern) Convert the CID to Multihash and hash `b58-multihash/PATH` without + trailing `/` with the hashing functions used in the denylist. Match against + declared double-hashes. +- (legacy) Convert the CID to CIDv1Base32 and hash `CIDV1BASE32/PATH` with the + hashing functions used in the denylist. Match against declared double-hashes. + An empty path means that the value to hash is `CIDV1BASE32/` (with the + trailing slash). This is the legacy hashing so the hashing function is + usually sha256 and the matched rules are legacy badbits anchor rules. + +The NameSystem (used only for `/ipns/*`) should: + +- If NAME is a CID (try parsing as CID first), extract the multihash, encode it with base58btc and hash it with the hashing functions used in the denylist. Match against declared double-hashes. +- Otherwise, assume NAME is a domain name: Hash `/ipns/NAME` with the hashing functions used in the denylist. Match against declared double-hashes. + +The BlockService should: + +- (modern) Convert the CID to b58-encoded-multihash (that is CIDv0) and hash the CID string. +- (legacy) Convert the CID to `CIDV1BASE32/` (keeping the CID codec and adding a slash at the end) and hash it with the hashing functions used in the denylist. Match against declared double-hashes. -The latter form allows blocking by double-hash using any hashing function of -choice. Implementations will have to hash requests using all the hashing functions -used in the denylist, so we recommend sticking to one. +:::note -Conveniently, the latter form allows using a b58-encoded sha256 multihashes -(usual form of CIDv0 - `Qmxxx...`), so that double-hashes can be like: +The "modern" double-hashed items (b58-encoded-multihash) can be created with +existing CLI tools like +[Kubo](https://docs.ipfs.tech/how-to/command-line-quick-start/): ``` $ printf "QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768/my/path" | ipfs add --raw-leaves --only-hash --quiet | ipfs cid format -f '%M' -b base58btc @@ -389,31 +417,9 @@ $ ipfs cid format -f '%M' -b base58btc bafybeihrw75yfhdx5qsqgesdnxejtjybscwuclpu QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768 ``` -Blocking layer recommendation: NameSystem + PathResolver + BlockService. - -In order to check for a matching rule, the PathResolver working with `CID/PATH` elements should: - -- Convert the CID to v1base32 and hash `CIDV1BASE32/PATH` with the hashing - functions used in the denylist. Match against declared double-hashes. An - empty path means that the value to hash is `CIDV1BASE32/` (with the trailing - slash). This is the legacy hashing so the hashing function is usually - sha256 and the matched rules are legacy badbits anchor rules. -- Convert the CID to CIDv0 and hash `CIDV0/PATH` without trailing `/` with the - hashing functions used in the denylist. Match against declared - double-hashes. - -The NameSystem should: - -- If NAME is a CID (try parsing as CID first), extract the multihash, encoded with base58btc and hash it with the hashing functions used in the denylist. Match against declared double-hashes. -- Otherwise, assume NAME is a domain name: Hash `/ipns/NAME` with the hashing functions used in the denylist. Match against declared double-hashes. - -The BlockService should: - -- Convert the CID to `CIDV1BASE32/` (keeping the CID codec and adding a slash at the end) and hash it with the hashing functions used in the denylist. Match against declared double-hashes. - -- Convert the CID to b58-encoded-multihash (that is CIDv0) and hash the CID string. +::: -#### Allow (or negated) rules +#### Negated rules The specification syntax examples describe a `.deny` list of items to block (deny). @@ -445,53 +451,76 @@ is blocked by default, and only matching items are allowed. ::: -#### Hint list +#### Hints + +A *hint* is an optional key-value metadata duple associated to a [block item](#block-item). -A hint list is an optional space-separated list of hints associated with specific block items in the form: +Hints can be defined for the entire denylist when `hints` map is present in the +[header](#header), or per item, as space-separated list at the end of a [block +item](#block-item): ``` [block_item] hintA:v1 hintB:v2 hintC:v3 ``` -Block items and hints are separated by one or more consecutive instances of -the "space" character. +Local hint overrides a global one with the same key name. -### Test fixtures +## Denylist integration -Denylist parsing and correct behaviour can be tested using the -[test.deny](https://github.com/ipfs-shipyard/nopfs/blob/master/tester/test.deny) -denylist, which provides example rules and describes the expected behaviour in -detail. +### File extension, locations and order -In particular, a reference [Blocker implementation validator](https://github.com/ipfs-shipyard/nopfs/tree/master/tester) is provided in Go, and can be adapted to other languages if needed. +While not pertaining to the denylist format itself, we introduce the following conventions about denylist files when they are stored in the local filesystem: -### Security +- Denylist files MUST be named with the extension `.deny`. +- Implementations SHOULD look in `/etc/ipfs/denylists/` and + `$XDG_CONFIG_HOME/ipfs/denylists/` (default: `~/.config/ipfs/denylists`) for denylist files. +- Implementations MAY also look in their own configuration directory. +- Denylist files are processed in alphabetical order so that rules from later + denylists override rules from earlier denylists on conflict. -This proposal takes into account security: +### Security - Denylist headers and line-length limits are well specified to avoid malformed lists to cause things like large memory usage while parsing. + - Implementations MUST error when parsed list is bigger than the limit defined in this specification. - Supported type of blocks have been thought out to avoid amplified consumption of resources or side effects (i.e. downloading of additional dag-blocks) during the implementation. + - Implementations SHOULD avoid retrieving content that is blocked by a denylist. - Paths are sanitized and follow the same encoding rules as URLs (RFC 3986), so that existing and safe parsing can be done with regular tooling. - Official and custom-hint systems allow the introduction of additional features that can co-exist with the specified format without needing to be supported. + - Implementation SHOULD ignore unsupported fields and hints. ### Privacy and User Control +The goal of content filtering is to empower operators of IPFS services with +tools to control what content is hosted and processed by their infrastructure. + +Implementations SHOULD allow the end user to configure denylists. + The main aspect regarding privacy in the scope of this specification has to do -with supporting the use of double-hashing in block items. +with supporting the use of [double-hashing](#double-hash) in block items. Double-hashing is particularly useful when the denylist is meant to be shared. Double-hashing: - Prevents readers of the denylist to know what the original content-address of the block item is, and therefore avoids making the denylist a directory of *bad* content. This is particularly useful for harmful content, where - solely accessing it is bad. + solely publishing the address (CID) and not the content it is bad. - Double-hashing does not exclude adding additional context via comments or hints - The presence of a single double-hashed block item makes necessary that the implementation hashes every CID and CID+path that needs to be checked, which has a performance impact. - In general, it is good that users can inspect the nature of the content blocked if they wish to, so we recommend not using double-hashing by default as it helps transparency (i.e. blocking due to copyright claims). -## Implementations +### Test fixtures + +Denylist parsing and correct behaviour can be tested using the +[test.deny](https://github.com/ipfs-shipyard/nopfs/blob/master/tester/test.deny) +denylist, which provides example rules and describes the expected behaviour in +detail. + +In particular, a reference [Blocker implementation validator](https://github.com/ipfs-shipyard/nopfs/tree/master/tester) is provided in Go, and can be adapted to other languages if needed. + + +### Implementations -- [NoPFS](https://github.com/ipfs-shipyard/nopfs): An implementation of IPIP-383 which add supports for content blocking to the go-ipfs stack and particularly to Kubo. +- [NOpfs](https://github.com/ipfs-shipyard/nopfs): An implementation of IPIP-383 which add supports for content blocking to the go-ipfs stack and particularly to [Kubo](https://github.com/ipfs/kubo). ## Copyright From 3eb75a7d56e4bd0fca18430f4a09863fd5db0c9a Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Fri, 27 Oct 2023 00:23:43 +0200 Subject: [PATCH 31/32] fix: doublehash example, editorials --- src/compact-denylist-format.md | 51 +++++++++++++++++----------------- 1 file changed, 26 insertions(+), 25 deletions(-) diff --git a/src/compact-denylist-format.md b/src/compact-denylist-format.md index c5714c925..ef313f22b 100644 --- a/src/compact-denylist-format.md +++ b/src/compact-denylist-format.md @@ -58,14 +58,14 @@ We include *negative block items* as well, with the idea of enabling denylists that are append-only. One of the main operational constraints we have seen is that a single item can cause a full denylist to be re-read, re-parsed and ultimately need a full restart of the application. We want to avoid that by -providing operators and implementors with the possiblity of just watching +providing operators and implementors with the possibility of just watching denylists for new items without then need to restart anything while new items -are added. This also gives the possiblity of storing an offset and seeking +are added. This also gives the possibility of storing an offset and seeking directly to it after application restarts. *negative block items* can also be used to make exceptions to otherwise more general rules. Another aspect that we have maintained in the back of our minds is the -possiblity of sharing lists using IPFS. The append-mostly aspect also plays a +possibility of sharing lists using IPFS. The append-mostly aspect also plays a role here, for lists can be chunked and DAG-ified and only the last chunk will change as the file grows. This makes our lists immediately friendly to content-addressing and efficient transmission over IPFS. However, the @@ -75,16 +75,16 @@ this spec. Beyond all of that, we put emphasis in making our format easily editable by users and facilitating integrations using scripts and with other applications (unrelated to the implementation of the parsing/blocking inside IPFS). We -conciously avoid JSON and other machine formats and opt for text and for +consciously avoid JSON and other machine formats and opt for text and for space-delimited items in a grep/sed/cut-friendly way. For example, we expect -that the following should just work accross implementations for adding and +that the following should just work across implementations for adding and blocking something new: ``` echo /ipfs/QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768 >> ~/.config/ipfs/custom.deny ``` -We conciously avoid defining any other API other than expecting +We consciously avoid defining any other API other than expecting implementations to honor blocking what is on the denylist and act accordingly when it is updated. CLI commands or API endpoint to modify list items etc. are outside the scope of this spec. Implementations how much information to @@ -138,7 +138,7 @@ hints: # Block IPNS key - blocks wrapped multihash. /ipns/k51qzi5uqu5dhmzyv3zac033i7rl9hkgczxyl81lwoukda2htteop7d3x0y1mf -# Double-hash CID block +# Double-hash CID block using sha2-256 hashing # base58btc-sha256-multihash(QmVTF1yEejXd9iMgoRTFDxBv7HAz9kuZcQNBzHrceuK9HR) # Blocks bafybeidjwik6im54nrpfg7osdvmx7zojl5oaxqel5cmsz46iuelwf5acja # and QmVTF1yEejXd9iMgoRTFDxBv7HAz9kuZcQNBzHrceuK9HR etc. by multihash @@ -150,7 +150,7 @@ hints: # /ipfs/bafyb4ieqht3b2rssdmc7sjv2cy2gfdilxkfh7623nvndziyqnawkmo266a/path # /ipfs/f01701e20903cf61d46521b05f926ba1634628d0bba8a7ffb5b6d5a3ca310682ca63b5ef0/path etc... # But not /path2 -//QmbK7LDv5NNBvYQzNfm2eED17SNLt1yNMapcUhSuNLgkqz +//gW813G35CnLsy7gRYYHuf63hrz71U1xoLFDVeV7actx6oX # Legacy CID double-hash block # sha256(bafybeiefwqslmf6zyyrxodaxx4vwqircuxpza5ri45ws3y5a62ypxti42e/) @@ -184,7 +184,6 @@ Comment lines start with `#`. Empty lines are allowed. Lines should not be longer than 2MiB including the "\n" delimiter. - ### Header The list header is an optional YAML block. @@ -273,7 +272,7 @@ Blocking layer recommendation: PathResolver. #### `/ipfs/CID/PATH*` IPFS-Path-Prefix-Rule: Blocks any multihash-path combination starting with the -the given path prefix. `/*` includes the empty path. Thus `/ipfs/CID/*` +given path prefix. `/*` includes the empty path. Thus, `/ipfs/CID/*` blocks the CID itself, and any paths. Examples: - `/ipfs/CID/*` : blocks CID (by multihash) and any path BEFORE resolving. @@ -301,7 +300,7 @@ When the rule `/ipfs/CID` exists and BlockService-level blocking IPNS-rule: Blocks the given IPNS name before resolving. It does not block the CID that it resolves to. -If the IPNS `NAME` is a domain name, it is blocked directy. +If the IPNS `NAME` is a domain name, it is blocked directly. If the IPNS `NAME` is a CIDv1 (libp2p-key) or b58-encoded-multihash (CIDV0), then the blocking affects the underlying Multihash. @@ -380,8 +379,8 @@ In order to check for a matching rule, the PathResolver working with `/ipfs/CID/ - (legacy) Convert the CID to CIDv1Base32 and hash `CIDV1BASE32/PATH` with the hashing functions used in the denylist. Match against declared double-hashes. An empty path means that the value to hash is `CIDV1BASE32/` (with the - trailing slash). This is the legacy hashing so the hashing function is - usually sha256 and the matched rules are legacy badbits anchor rules. + trailing slash). This is the legacy hashing, the function is + sha256 and the matched rules are legacy badbits anchor rules. The NameSystem (used only for `/ipns/*`) should: @@ -399,24 +398,27 @@ The "modern" double-hashed items (b58-encoded-multihash) can be created with existing CLI tools like [Kubo](https://docs.ipfs.tech/how-to/command-line-quick-start/): +Convert any CID to its multihash with: + ``` -$ printf "QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768/my/path" | ipfs add --raw-leaves --only-hash --quiet | ipfs cid format -f '%M' -b base58btc -QmSju6XPmYLG611rmK7rEeCMFVuL6EHpqyvmEU6oGx3GR8 +$ ipfs cid format -f '%M' -b base58btc bafybeihrw75yfhdx5qsqgesdnxejtjybscwuclpusvxkuttep6h7pkgmze +QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768 ``` -The rule `//QmSju6XPmYLG611rmK7rEeCMFVuL6EHpqyvmEU6oGx3GR8` will block -`/ipfs/bafybeihrw75yfhdx5qsqgesdnxejtjybscwuclpusvxkuttep6h7pkgmze/my/path`, -with `QmSju6XPmYLG611rmK7rEeCMFVuL6EHpqyvmEU6oGx3GR8` being the base58-encoded -multihash contained in -`bafybeihrw75yfhdx5qsqgesdnxejtjybscwuclpusvxkuttep6h7pkgmze`. - -We can convert any CID to its multihash with: +Then, create a second multihash to be used in `//DOUBLE-HASH` rule that will be +blocking specific content path under the extracted multihash: ``` -$ ipfs cid format -f '%M' -b base58btc bafybeihrw75yfhdx5qsqgesdnxejtjybscwuclpusvxkuttep6h7pkgmze -QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768 +$ printf "QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768/my/path" | ipfs block put --mhtype sha2-256 | ipfs cid format -f '%M' -b base58btc +QmSju6XPmYLG611rmK7rEeCMFVuL6EHpqyvmEU6oGx3GR8 ``` +The double-hash rule `//QmSju6XPmYLG611rmK7rEeCMFVuL6EHpqyvmEU6oGx3GR8` will block +`/ipfs/bafybeihrw75yfhdx5qsqgesdnxejtjybscwuclpusvxkuttep6h7pkgmze/my/path`. + +The `QmecDgNqCRirkc3Cjz9eoRBNwXGckJ9WvTdmY16HP88768` is the multihash contained +in `bafybeihrw75yfhdx5qsqgesdnxejtjybscwuclpusvxkuttep6h7pkgmze`. + ::: #### Negated rules @@ -517,7 +519,6 @@ detail. In particular, a reference [Blocker implementation validator](https://github.com/ipfs-shipyard/nopfs/tree/master/tester) is provided in Go, and can be adapted to other languages if needed. - ### Implementations - [NOpfs](https://github.com/ipfs-shipyard/nopfs): An implementation of IPIP-383 which add supports for content blocking to the go-ipfs stack and particularly to [Kubo](https://github.com/ipfs/kubo). From a0cb44bd98c3719046ac5406329c1802abf95420 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Thu, 1 Feb 2024 16:07:31 +0100 Subject: [PATCH 32/32] ipip-383: final editorials --- src/compact-denylist-format.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/src/compact-denylist-format.md b/src/compact-denylist-format.md index ef313f22b..c176b09f9 100644 --- a/src/compact-denylist-format.md +++ b/src/compact-denylist-format.md @@ -107,7 +107,7 @@ The following example showcases the features and [syntax](#file-syntax) of a com ```yaml version: 1 -name: IPFSCorp blocking list +name: Example IPFSCorp blocking list description: A collection of bad things we have found in the universe author: abuse-ipfscorp@example.com hints: @@ -129,10 +129,10 @@ hints: !/ipfs/QmUboz9UsQBDeS6Tug1U8jgoFkgYxyYood9NDyVURAY9pK/blocked/not !/ipfs/QmUboz9UsQBDeS6Tug1U8jgoFkgYxyYood9NDyVURAY9pK/blocked/exceptions* -# Block IPNS domain name +# Block DNSLink domain name /ipns/domain.example -# Block IPNS domain name and path +# Block DNSLink domain name and path /ipns/domain2.example/path # Block IPNS key - blocks wrapped multihash. @@ -521,7 +521,9 @@ In particular, a reference [Blocker implementation validator](https://github.com ### Implementations -- [NOpfs](https://github.com/ipfs-shipyard/nopfs): An implementation of IPIP-383 which add supports for content blocking to the go-ipfs stack and particularly to [Kubo](https://github.com/ipfs/kubo). +- [NOpfs](https://github.com/ipfs-shipyard/nopfs): A reference library implementation of IPIP-383 which add supports for content blocking to the go-ipfs stack. +- [Kubo](https://github.com/ipfs/kubo): IPFS implementation, ships with built-in NOpfs implementation ([docs](https://github.com/ipfs/kubo/blob/master/docs/content-blocking.md)) +- [Rainbow](https://github.com/ipfs/rainbow/): A standalone IPFS Gateway implementation, ships with built-in NOpfs implementation ## Copyright