Skip to content
This repository has been archived by the owner on Apr 16, 2020. It is now read-only.

Providing System #84

Open
22 tasks
michaelavila opened this issue Jul 30, 2019 · 0 comments
Open
22 tasks

Providing System #84

michaelavila opened this issue Jul 30, 2019 · 0 comments

Comments

@michaelavila
Copy link


Why Improving Provider Strategies is Important

Our current scheme of breaking content up into 256kb blocks then providing each of those blocks to the rest of the network presents a large burden for announcing and discovering. Regardless of the performance of the DHT, or whatever underlying mechanism is used, the amount of content we want to live in IPFS necessitates not announcing (aka providing) every block to the network, our goals for the amount of content are ambitious.

I know of no official, repeated benchmarks of providing times. Lately, individual announcements (the network part of a provide) take several minutes.


Current State of Provider Strategies in go-ipfs (+ some history)

There exists an experimental providing system that I’ve been working on with the hope that it will eventually replace the current providing system in go-ipfs. The current system is hardly a system at all, instead it’s just a few lines of code in the go-bitswap repository that aggressively provides every block bitswap comes into contact with, which is not flexible. The goal of the new system is to give go-ipfs more control over which blocks are provided based on the context that the IPFS node is operating in. Then, from there, strategies can be implemented on top.

This work has gone through some stages. The original epic tracking issue was ipfs/kubo#5774 (mine that is, this work has attempts going back as far as 2015). My early attempts were a couple of failures and restarts. Some of that related work is here:

The first introduction of the new provider system occurred here: ipfs/kubo#6068. Due to some changes that were made in bitswap providing, the gateways took a long time to get to the root blocks in order to provide them. The fix was simply to provide root blocks immediately in a separate goroutine, alongside the providing for the other blocks. We used the new provider system in order to get it merged and generate feedback on a minimal provider setup.

Eventually though, it became clear that the provider system’s biggest challenge was that it was an “all or nothing” change, which was proving difficult, and so we decided to release the work under an experimental flag––admittedly something that should’ve happened a lot sooner, as it helped tremendously. From there, the following PR emerged (and merged!):

This PR, although simple looking, has the foundations for the new provider system while also fulfilling a need that, at the time, infra was asking for: to disable providing without disabling content routing. In this PR the go-bitswap workers are disabled if the StrategicProviding experimental flag is set to true, the first of such behavior. The plan is to layer on the providing complexity from here. The PRs from earlier in the year are at various stages of completion and all of them have more than this PR (#6292). They are worth reviewing for the variety of things they addressed. Given this new flexibility, I was curious what others needed so we could try and do small releases of specific providing behavior and so I tried reaching out ipfs/kubo#6221.

Towards the beginning of June ’19, ipfs-cluster asked for a simple version of the provider system to be extracted from go-ipfs (ipfs/kubo#6417) so that it could be used in ipfs-lite. This request resulted in the extraction of the provider system to https://github.com/ipfs/go-ipfs-provider. This was just before Team Week in Barcelona. A couple of weeks before Team Week I started only looking into content routing issues in libp2p as performance had degraded so much that the go-ipfs provider system wasn’t useful. After Team Week, I learned that what I was working on overlapped with what the Gateway Tiger Team was looking into, and so I tried (somewhat successfully) to help.

During the Gateway Tiger Team work, just after Team Week, someone proposed to introduce a simplistic roots only provide strategy to address some of the performance issues the gateways were experiencing. This change (ipfs/kubo#6472), while ultimately not merged, gives an idea of how roots will be implemented. The biggest difference is that this PR uses the “old” reprovider and forces a roots strategy, where we want to use the new reprovider with the strategy that was specified instead.

Notably, a release (0.4.21, I believe), team week, and the provider extraction all happened at the same time. So, some of that work had to be resolved in both the extracted and non-extracted versions. Then the extraction needed to be merged, which it was. Before team week and the content routing issues prior to even that, I was trying to get the following done:

  • Provide roots [WIP]
  • Prioritized queue [WIP]
  • Provide all

I’m still going try and at least get PRs up for these things.


Recent Work Done and Relevant Outcomes

  • A limited providing mechanism has been introduced into the go-ipfs codebase and the option of disabling providing has been added to the go-bitswap codebase.

  • The initial providing strategy details are:

    • Configuration:
      • Provider.Strategy = none|roots|all|etc
        • Here’s how I can imagine each of these root configurations working:
          • none - simple, no commands result in a provide
          • roots - for add/pin/get/etc only provide roots, provide nodes in all other contexts (e.g. dag put)
          • all - for add/pin/get/etc provide all added nodes recursively, provide nodes in all other contexts (e.g. dag put)
          • etc - needs to be figured out
      • ipfs add —provide-strategy=none|roots|all|etc
      • ipfs pin add —provide-strategy=none|roots|all|etc
      • ipfs get —provide-strategy=none|roots|all|etc
      • There are probably other strategies for specific situations (like some package manager filesystem layout). We just haven’t got to this point yet.
  • Prioritized Queue modifications

    • Since the queue keys are not used for anything other than ordering, you can add priority by prepending something like /[0-9]/90238028234923408234/QmCID. It’s unlikely that you’ll need to do any data migrations as all of the existing keys will still work. You may just need to run the node and let it finish whatever work is in the queue before doing the upgrade (not necessary, but a precaution to consider). This change is important because we need to preserve the “roots provided first” behavior that currently exists in go-ipfs, and so prioritizing root nodes at the top of the queue is needed.
  • MFS Research for Package Managers

    • Basically, MFS uses the DAG service directly. Prior to this work, the dag service would eventually do things that caused bit swap to come into contact with the blocks for the first time, which meant they were eventually provided. This will no longer work. Instead, MFS will also need to interact with the Provider service in order to provide those blocks with the strategy MFS thinks is important. One of the important things for the provider strategies is knowing the shape of the DAG when deciding what to provide. The Provider (Reprovider technically) system currently does this by interacting with the DAG service. If that’s not possible when dealing with MFS then it’s something that will need to be considered.

Technical Notes

Provider Queue

  • Keys: /provider-v1/queue/<timestamp-nano>/<cid>

The provider queue keys are structured such that the sorting aspect of the queue is achieved using the lexicographical sorting of the keys in the datastore. To get the head of the queue, simply get first entry. Further, adding to the queue doesn't require crawling the queue. The keys are not parsed, they are only used for sorting.

  • Values: <cid>

The values are deserialized and provided.

Tracker

  • Keys: /provider/tracking/<cid>

Structured in this way so that querying for the presence of a cid is fast.

  • Values: <cid>

The values are deserialized and reprovided. In theory, we could instead store the last provided timestamp and just parse the cid from the key. This allows us to skip reproviding cids that were recently enough provided.


Future Ideas for Improving Provider Strategies (and Projected Impact)

  • @Kubuxu has the interesting information on the more advanced probabilistic strategies we eventually want to be using. I don’t know if any of that is documented anywhere, but I think starting the process of detailing that out would help tremendously. The future impact will be pretty big as we can all have clarity around an end goal to march towards. Some of that is described here Provide fewer nodes ipfs/kubo#6155

  • Spending that last little bit of time around Team Week learning about content routing in libp2p made me think there’s some duplication we have on our side. I think go-ipfs could interact more directly with the libp2p routing table, in order to not have a separate tracking table for the provider mechanism in go-ipfs. If the libp2p content router exposed enough of an API, the go-ipfs provider mechanism could likely be built more directly on top of it. My 2 cents.

  • Honestly, more people.


TODO

  • Backtracking ***
  • Provid roots (strategy)
  • Prioritized queue
  • Provide all (strategy)
  • Provide in ALL appropriate locations/commands
    • add
    • pin
    • get
    • ls
    • cat
    • dag
    • block
    • object
    • refs
    • gateway handlers
    • pubsub
    • mfs ***
  • Provide fancy (strategy) ***
  • Provider commands
    • ipfs provider tracking – lists all tracked cids
    • ipfs provider reprovide - do the reprovide (currently: ipfs bitswap reprovide)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant