Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPFS filtering to allow node operators to decide on content they are willing to serve #8492

Closed
3 tasks done
thibmeu opened this issue Oct 6, 2021 · 14 comments · Fixed by #10161
Closed
3 tasks done
Assignees
Labels
kind/feature A new feature P1 High: Likely tackled by core team if no one steps up

Comments

@thibmeu
Copy link
Contributor

thibmeu commented Oct 6, 2021

Checklist

  • My issue is specific & actionable.
  • I am not suggesting a protocol enhancement.
  • I have searched on the issue tracker for my issue.

Description

Recently, Cloudflare has open sourced a fork of go-ipfs providing filtering capabilities, grouped under safemode command. The architecture is described in a dedicated blog.

The system works by filtering certain CID when walking the DAG. This allow node operators to prevent certain CID from being provided, both by the HTTP gateway and to the P2P network.
CIDs to be filtered are stored in a blocklist. By default, this blocklist is in a dedicated mount of the datastore /safemode.

Action that can be performed by a blocklist are (based on the proposed interface):

  • block to add content to the blocklist
  • unblock to remove it
  • purge to remove content from the blockstore. Ideally, this option could be extensible, to purge remote datastore, or HTTP cache for instance
  • search to query the blocklist
  • audit to access the log of actions that have been performed against the blocklist

For convenience, ipfs safemode command provides multiple way to resolve content. From its documentation:

- IPFS address, i.e. /ipfs/<CID>
- IPNS address, i.e. /ipns/<hash_publickey>
- DNSLink address, i.e. /ipns/example.com
- HTTP URL, i.e. https://example.com/ or https://gateway.example.com/ipfs/<CID>

This is a proposal implementation, which satisfies some requirements laid out in ipfs/roadmap#64. It provides a more standardised approach for node operators to filter content they are willing to provide.

The implementation has been developed 3 years ago, and may not suit the current architecture of the go-ipfs project.

@thibmeu thibmeu added the kind/feature A new feature label Oct 6, 2021
@aschmahmann aschmahmann mentioned this issue Oct 8, 2021
6 tasks
@BigLep
Copy link
Contributor

BigLep commented Oct 8, 2021

@thibmeu : thanks for bringing this up. I think we need to have a larger discussion about the kind of software Gateway Operators want to have before we keep proceeding with the status quo of go-ipfs serving the wide range of usecases from high traffic gateways to desktop applications. go-ipfs maintainers are going to link discussions/notes that we're having in 2021Q4 on this topic to #8499 . We'll certainly be engaging with Cloudflare as part of this process.

@BigLep BigLep added the status/blocked Unable to be worked further until needs are met label Jan 7, 2022
@BigLep BigLep added P3 Low: Not priority right now and removed status/blocked Unable to be worked further until needs are met labels Jun 3, 2022
@BigLep
Copy link
Contributor

BigLep commented Jun 3, 2022

2022-06-03 conversation: we have the capability for this in go-bitswap per #8763 . If you're interested in contributing a plugin, that would be welcome. Otherwise this isn't a priority for the core maintainers because go-ipfs isn't really designed for large-scale operations, but we'll support operators on any reviews.

@BigLep
Copy link
Contributor

BigLep commented Jun 3, 2022

@guseggert will link the issue that is actively being worked on right now that will make plugins easier to write/maintain.

@guseggert
Copy link
Contributor

The issue is #7653, which allows arbitrary modifications to the go-ipfs dependency graph using a plugin, so that you can inject a custom exchange.Interface (e.g. a Bitswap instance w/ a customized filter).

@lidel lidel added P1 High: Likely tackled by core team if no one steps up and removed P3 Low: Not priority right now labels Aug 2, 2022
@lidel lidel self-assigned this Aug 2, 2022
@lidel
Copy link
Member

lidel commented Aug 2, 2022

I believe it is time to prioritize this. There is enough need and interest around blocking bad bits for this to be part of Kubo, and not just a plugin:

Quick notes:

  1. denylists are not enough. it has to be allow and deny lists from the start
    • node operators been asking not only for blocking bad bits, but also a primitive for blocking everything and only allowing specific CIDs and paths (e.g. a startup only wants to run a gateway to host their user data etc). if we don't tackle allowlists as part of this, we will end up with franken-api in the future when allowlists are bolted on awkwardly.
  • MVP:
    • Add command namespace (tbd, ipfs rules --help is as good as any other) allowing user to build content policy around allow or deny (and set the default strategy).
    • We don't need to cover all use cases, it should be a low level primitive that allows people to implement their own strategies on top of (similar to firewall rules).
      • each cid / path has to be added as an explicit allow or deny entry
      • use default policy when no entry matching
      • ability to mark added rule as sensitive (enables us to interop with https://badbits.dwebops.pub/) so it is never stored/exported in cleartext
      • use this during path resolution, bitswap and processing Gateway requests (covers the common asks from the community)
  1. import and export commands should be part of the UX, but we need to agree on the transport format – gathering feedback in IPIP: format for denylists for IPFS Nodes and Gateways specs#299

@lidel
Copy link
Member

lidel commented Aug 9, 2022

Another requirement from Infra team: ability to allow / deny specific PeerIDs.

This is a real world which I also needed in the past. In many cases, we struggle to create deterministic test fixtures. Making sure node can't dial specific Peer and needs to get data from someone else requires disabliing more and more internal services (mdns, routing, relays...) and is very brittle, test setup can break the moment we introduce new discovery method.

When we design ipfs rules it should encompass allow / deny rules for:

  • CIDs and content paths
    • note: iiuc we already have places to plug-in hooks (e.g., go-bitswap)
  • PeerIDs and multiaddrs
    • or any other Content Routing Hints (e.g., reframe endpoints passed by a client as a routing hint)

@BigLep
Copy link
Contributor

BigLep commented Jan 9, 2023

For clarity, the new spec on this topic is ipfs/specs#340

@lidel
Copy link
Member

lidel commented Mar 28, 2023

Linking related work by @hsanjuan for discoverability

@hsanjuan
Copy link
Contributor

Note that it depends on: #9750. Nopfs injects itself as a NameSystem, path.Resolver and BlockService wrapper so that it can block things before Resolution and Retrieval.

More generally it depends on Kubo providing a more stable way of plugging-in a Blocker, which basically provides 2 methods:

  • IsCIDBlocked(CID) Result
  • IsPathBlocked(ipfsOripnsFullPath) Result

The Result can be a bool, but I'd prefer an err or any other type that can carry additional information about the block (for example the error could explain the reason of the block, or the denylist that triggers it to the user).

@lidel
Copy link
Member

lidel commented May 16, 2023

Relevant discussion happened today in 2023-05-16-Content-Routing-WG-11.

Summary of the burning need at hand
  • Priority for IPFS ecosystem is to allow operators to have built-in support for self-managed or publicly available lists like https://badbits.dwebops.pub
    • at the same time we don't want to hard-code anything, nor spend too much time on opinionated update polling/update/composition logic
  • MVP in Kubo would be to observe a file on disk in the format from IPIP-383 and apply the deny rules present in it.
    • this simple primitive should be easy to implement, but at the same time allows operators to compose a deny list outside Kubo, and also manage logic responsible for fetching updates, if a third-party list is used.
Loose implementation scope/direction
  • reuse code from ipfs-shipyard/nopf
  • follow path conventions from IPIP-383 + Kubo-specific location at $IPFS_PATH/denylists/*.deny

@hsanjuan
Copy link
Contributor

hsanjuan commented Oct 2, 2023

Hi, I chatted briefly with @BigLep and there seems to be interest to bring this MVP to Kubo. I can do that.

To summarize:

  • We have a working plugin https://github.com/ipfs-shipyard/nopfs/tree/master/nopfs-kubo-plugin that watches denylists on disk. Any appends to those denylists are processed, so that you can echo "/ipfs/<cid>" >> denylist and not have to re-start Kubo.
  • There is no "watch" system implemented yet. I have thoughts about this (unixfs files + pubsub) but I think this should come later.

What we need:

  • We need to settle the IPIP - IPIP-0383: Compact Denylist Format specs#383 (I would like to at least)
  • We need to decide how to bring noPFS into Kubo (as experimental feature):
    • The most straightforward way is to bring it as pre-compiled plugin into /plugin/plugins (I personally lean this way as it is cleaner, specially for an experimental feature).
    • We can also wrap things by hand during setup. As reminder, blocking checks are performed on:
      • Blockservice - CID
      • NameSystem - IPNS blocking
      • IPLD/UnixFS Path Resolvers - Path blocking.
    • I think it is worth integrating the library part of NoPFS into Boxo.
  • Additionally, we can start discussing (but no need to decide) about:
    • Subscribing to lists
    • More integration: i.e. gateway responses could detect Blocked errors and look into the rule hints for http return code values.

Is there a meeting we can use to go over this so I can start work ASAP (my window of availability is 4 weeks).

@BigLep
Copy link
Contributor

BigLep commented Oct 2, 2023

2023-10-02 conversation:

  • It should be enabled be default
  • Have a config flag or environment variable for disabling this functionality.
  • This is not experimental - it's built in on.
  • FYI: There is a "Bad Content Working Group". Agreed with them that shouldn't block this going forward even there is larger ideals of servicing other ecosystems.
  • Agreed that it can be a pre-compiled plugin.

@hsanjuan
Copy link
Contributor

hsanjuan commented Oct 3, 2023

I have done another round on the spec. There's an open question about defaulting or strongly suggesting a function for double-hashing... input is welcome as I understand that sha256 is not the best.

Note that every CID and CID+Path needs to be double-hashed when a list includes a double-hashed block item in it, so the function should aim to minimize the perf impact.

hsanjuan added a commit that referenced this issue Oct 3, 2023
Fixes #8492.

This introduces "nopfs" as a preloaded plugin into Kubo.

It automatically make Kubo watch *.deny files found in:

- /etc/ipfs/denylists
- $XDG_CONFIG_HOME/ipfs/denylists
- $IPFS_PATH/denylists

(files need to be present before boot in order to be watched).

Debug logging can be enabled with `GOLOG_LOG_LEVEL="nopfs=debug"`.

All blocks are logged to "nopfs-blocks", so logging requests for blocked
content can be achieved with
`GOLOG_LOG_LEVEL="nopfs-blocks=info"`. Interactive users will receive the
error as response to their commands too.

One particularity to keep in mind is that GetMany() will silently drop blocked
blocks from the response (an error and a warning are logged). AddMany() will act
similarly and avoid adding blocked blocks.
hsanjuan added a commit that referenced this issue Oct 3, 2023
Fixes #8492.

This introduces "nopfs" as a preloaded plugin into Kubo.

It automatically make Kubo watch *.deny files found in:

- /etc/ipfs/denylists
- $XDG_CONFIG_HOME/ipfs/denylists
- $IPFS_PATH/denylists

(files need to be present before boot in order to be watched).

Debug logging can be enabled with `GOLOG_LOG_LEVEL="nopfs=debug"`.

All blocks are logged to "nopfs-blocks", so logging requests for blocked
content can be achieved with
`GOLOG_LOG_LEVEL="nopfs-blocks=warn"`:

```
WARN (...) QmRFniDxwxoG2n4AcnGhRdjqDjCM5YeUcBE75K8WXmioH3: blocked (test.deny:9)
```

Interactive/gateway users will also receive errors as responses but with less details:

```
Error: /ipfs/QmQvjk82hPkSaZsyJ8vNER5cmzKW7HyGX5XVusK7EAenCN is blocked and cannot be provided
```

One particularity to keep in mind is that GetMany() will silently drop blocked
blocks from the response (a warnings are logged). AddMany() will act
similarly and avoid adding blocked blocks.

The code implementing all this is actually in nopfs:

- https://github.com/ipfs-shipyard/nopfs (main library)
- https://github.com/ipfs-shipyard/nopfs/tree/master/ipfs (wrappers)

The interpretation of the list rules and block detection is well tested, but a
general review might be in order.
hsanjuan added a commit that referenced this issue Oct 3, 2023
Fixes #8492.

This introduces "nopfs" as a preloaded plugin into Kubo.

It automatically make Kubo watch *.deny files found in:

- /etc/ipfs/denylists
- $XDG_CONFIG_HOME/ipfs/denylists
- $IPFS_PATH/denylists

(files need to be present before boot in order to be watched).

Debug logging can be enabled with `GOLOG_LOG_LEVEL="nopfs=debug"`.

All blocks are logged to "nopfs-blocks", so logging requests for blocked
content can be achieved with
`GOLOG_LOG_LEVEL="nopfs-blocks=warn"`:

```
WARN (...) QmRFniDxwxoG2n4AcnGhRdjqDjCM5YeUcBE75K8WXmioH3: blocked (test.deny:9)
```

Interactive/gateway users will also receive errors as responses but with less details:

```
Error: /ipfs/QmQvjk82hPkSaZsyJ8vNER5cmzKW7HyGX5XVusK7EAenCN is blocked and cannot be provided
```

One particularity to keep in mind is that GetMany() will silently drop blocked
blocks from the response (a warnings are logged). AddMany() will act
similarly and avoid adding blocked blocks.

The code implementing all this is actually in nopfs:

- https://github.com/ipfs-shipyard/nopfs (main library)
- https://github.com/ipfs-shipyard/nopfs/tree/master/ipfs (wrappers)

The interpretation of the list rules and block detection is well tested, but a
general review might be in order.
@hacdias hacdias assigned hsanjuan and unassigned hacdias Oct 3, 2023
lidel added a commit that referenced this issue Oct 28, 2023
Fixes #8492

This introduces "nopfs" as a preloaded plugin into Kubo
with support for denylists from ipfs/specs#383

It automatically makes Kubo watch *.deny files found in:

- /etc/ipfs/denylists
- $XDG_CONFIG_HOME/ipfs/denylists
- $IPFS_PATH/denylists

* test: Gateway.NoFetch and GatewayOverLibp2p

adds missing tests for "no fetch" gateways one can expose,
in both cases the offline mode is done by passing custom
blockservice/exchange into path resolver, which means
global path resolver that has nopfs intercept is not used,
and the content blocking does not happen on these gateways.

* fix: use offline path resolvers where appropriate

this fixes the problem described in
#10161 (comment)
by adding explicit offline path resolvers that are backed
by offline exchange, and using them in NoFetch gateways
instead of the default online ones

---------

Co-authored-by: Henrique Dias <hacdias@gmail.com>
Co-authored-by: Marcin Rataj <lidel@lidel.org>
@lidel
Copy link
Member

lidel commented Oct 28, 2023

A minimal implementation of IPIP-383 from #10161 landed in master branch and is scheduled to be released in Kubo 0.24-rc1 for feedback. More details in /docs/content-blocking.md

@BigLep BigLep mentioned this issue Nov 9, 2023
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature A new feature P1 High: Likely tackled by core team if no one steps up
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

7 participants
@BigLep @lidel @guseggert @hsanjuan @thibmeu @hacdias and others