js-ipfs pinning performance #2197

dirkmc · 2019-06-24T22:06:39Z

ipfs.add() performance degrades severely once the number of pins exceeds 8192

Background

Users can pin a file or a block to prevent it from being garbage collected.

The pinning module maintains two sets of pins:

direct: the CID of the block that is pinned
recursive: the CID of the root node of a DAG of blocks

These pin sets are stored in the block store with the following structure:

if the number of pins is less than 8192, create a node with
- 256 links pointing to an empty block
- a link pointing to each pinned block
if the number of pins is greater than 8192
- distribute pins deterministically amongst 256 buckets
- each bucket is a node with one or more pins, with the same structure described above (ie buckets with > 8192 pins distribute them into sub-buckets etc)

Performance

A pin set with less than 8192 pins is stored in a single DAG node. Once there are more than 8192 pins, they are distributed between 256 buckets, each with its own DAG node. Each time a new pin is added to the set, the distribution across the group of buckets is calculated and written to the block store. The distribution is deterministic, so in reality only one bucket changes each time a new pin is added.

For example, if we simplify and say there are 8 buckets, with 5 pins (A - E):
[] [D] [] [EA] [] [C] [] [B]
When we add pin F only one bucket changes:
[] [D] [] [EA] [] [C] [] [BF]

We can improve performance by adding a cache that remembers the structure of the pin sets, and only write nodes that change to the block store (instead of writing all nodes each time a pin is added or removed). This improves ipfs.add() performance dramatically once we exceed 8192 pins:

Memory usage

The pinner uses fnv1a to distribute pins. fnv1a outputs a number (8 bytes) so if we also use it for cache keys each key will be 8 bytes. Each pin is represented by a DAG link pointing to the pinned CID. The DAG link has

a name (the empty string)
a size (number)
a cid
- version (number)
- codec (eg 'sha2-256')
- multihash (the hash itself)
- multibaseName (eg 'base58btc')

So rounding up, a DAG link requires about 128 bytes of memory. eg 10k pins requires about 1MB memory for the cache.
Note: Storing the DAGLink object (rather than just the CID as a Buffer) saves us having to re-create a lot of JavaScript Objects but uses about twice the memory. However this memory would need to be reserved anyway each time a pin is added.
Note: The cache is not used if there are less than 8192 pins

Command line

When invoking ipfs add from the command line, with the daemon running, we need to load the http api each time. This can be several times slower than the add operation itself, so we should look at optimizing it.

The text was updated successfully, but these errors were encountered:

achingbrain · 2019-11-25T18:55:13Z

Does anyone have any context as to why we store pinsets as DAGs instead of storing individual CIDs we don't want to gc in leveldb? I'm thinking having dedicated datastores for pinned CIDs might be more performant than having to perform all these operations every time we read/write the pinsets.

cc @Stebalien @daviddias

Stebalien · 2019-11-25T22:33:23Z

Does anyone have any context as to why we store pinsets as DAGs instead of storing individual CIDs

The goal was to eventually store the entire repo in a single DAG.

IMO, we should do this but at a different layer. I'd like to:

Store the pin set as key/values in the datastore (easy, can use datastore queries, performant, etc.).
Create a new dag-backed datastore (using a tiered HAMT) for everything except blocks.

achingbrain · 2019-11-26T08:21:22Z

Great, I think 1) is what I'm suggesting so I'll give that a go and see what the performance difference is like.

achingbrain · 2021-07-21T14:57:59Z

This has been fixed by #2771

dirkmc mentioned this issue Jun 24, 2019

Use cache to improve pin performance #2198

Closed

alanshaw added exp/wizard Extensive knowledge (implications, ramifications) required exploration kind/support A question or request for support status/ready Ready to be worked labels Jul 10, 2019

This was referenced Nov 28, 2019

perf: debounce pin flush #2634

Closed

ipfs.dag.put pinning performance #2650

Closed

achingbrain mentioned this issue Feb 12, 2020

feat: store pins in datastore instead of a DAG #2771

Merged

5 tasks

expede mentioned this issue Apr 16, 2021

WNFS data root updates can take > 60s with **some** accounts fission-codes/fission#489

Open

5 tasks

achingbrain closed this as completed Jul 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

js-ipfs pinning performance #2197

js-ipfs pinning performance #2197

dirkmc commented Jun 24, 2019 •

edited

Loading

achingbrain commented Nov 25, 2019

Stebalien commented Nov 25, 2019

achingbrain commented Nov 26, 2019

achingbrain commented Jul 21, 2021

js-ipfs pinning performance #2197

js-ipfs pinning performance #2197

Comments

dirkmc commented Jun 24, 2019 • edited Loading

Background

Performance

Memory usage

Command line

achingbrain commented Nov 25, 2019

Stebalien commented Nov 25, 2019

achingbrain commented Nov 26, 2019

achingbrain commented Jul 21, 2021

dirkmc commented Jun 24, 2019 •

edited

Loading