Skip to content

Commit

Permalink
fix: ensure connmgr is smaller then autoscalled ressource limits
Browse files Browse the repository at this point in the history
Fixes #9545
  • Loading branch information
Jorropo committed Jan 17, 2023
1 parent d90a9b5 commit b88d580
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 13 deletions.
4 changes: 4 additions & 0 deletions config/init.go
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,10 @@ const DefaultConnMgrGracePeriod = time.Second * 20
// type.
const DefaultConnMgrType = "basic"

// DefaultResourceMgrMinInboundConns is a MAGIC number that probably a good
// enough number of inbound conns to be a good network citizen.
const DefaultResourceMgrMinInboundConns = 800

func addressesConfig() Addresses {
return Addresses{
Swarm: []string{
Expand Down
18 changes: 18 additions & 0 deletions core/node/libp2p/rcmgr_defaults.go
Original file line number Diff line number Diff line change
Expand Up @@ -186,5 +186,23 @@ Run 'ipfs swarm limit all' to see the resulting limits.

defaultLimitConfig := scalingLimitConfig.Scale(int64(maxMemory), int(numFD))

// Simple checks to overide autoscaling ensuring limits make sense versus the connmgr values.
// There are ways to break this, but this should catch most problems already.
// We might improve this in the future.
// See: https://github.com/ipfs/kubo/issues/9545
if cfg.ConnMgr.Type == nil || cfg.ConnMgr.Type.String() != "none" {
maxInboundConns := int64(defaultLimitConfig.System.ConnsInbound)
if connmgrHighWaterTimesTwo := cfg.ConnMgr.HighWater.WithDefault(config.DefaultConnMgrHighWater) * 2; maxInboundConns < connmgrHighWaterTimesTwo {
maxInboundConns = connmgrHighWaterTimesTwo
}

if maxInboundConns < config.DefaultResourceMgrMinInboundConns {
maxInboundConns = config.DefaultResourceMgrMinInboundConns
}

defaultLimitConfig.System.StreamsInbound = int(maxInboundConns * int64(defaultLimitConfig.System.StreamsInbound) / int64(defaultLimitConfig.System.ConnsInbound))
defaultLimitConfig.System.ConnsInbound = int(maxInboundConns)
}

return defaultLimitConfig, nil
}
30 changes: 17 additions & 13 deletions docs/libp2p-resource-management.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,19 +40,19 @@ libp2p's resource manager provides tremendous flexibility but also adds complexi
1. "The user who does nothing" - In this case Kubo attempts to give some sane defaults discussed below
based on the amount of memory and file descriptors their system has.
This should protect the node from many attacks.

1. "Slightly more advanced user" - They can tweak the default limits discussed below.
Where the defaults aren't good enough, a good set of higher-level "knobs" are exposed to satisfy most use cases
without requiring users to wade into all the intricacies of libp2p's resource manager.
The "knobs"/inputs are `Swarm.ResourceMgr.MaxMemory` and `Swarm.ResourceMgr.MaxFileDescriptors` as described below.
The "knobs"/inputs are `Swarm.ResourceMgr.MaxMemory` and `Swarm.ResourceMgr.MaxFileDescriptors` as described below.

1. "Power user" - They specify overrides to computed default limits via `ipfs swarm limit` and `Swarm.ResourceMgr.Limits`;

### Computed Default Limits
With the `Swarm.ResourceMgr.MaxMemory` and `Swarm.ResourceMgr.MaxFileDescriptors` inputs defined,
[resource manager limits](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#limits) are created at the
[system](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#the-system-scope),
[transient](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#the-transient-scope),
[resource manager limits](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#limits) are created at the
[system](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#the-system-scope),
[transient](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#the-transient-scope),
and [peer](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#peer-scopes) scopes.
Other scopes are ignored (by being set to "[~infinity](#infinite-limits])".

Expand All @@ -68,8 +68,8 @@ The reason these scopes are chosen is because:
(e.g., bug in a peer which is causing it to "misbehave").
In the unintional case, we want to make sure a "misbehaving" node doesn't consume more resources than necessary.

Within these scopes, limits are just set on
[memory](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#memory),
Within these scopes, limits are just set on
[memory](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#memory),
[file descriptors (FD)](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#file-descriptors), [*inbound* connections](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#connections),
and [*inbound* streams](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#streams).
Limits are set based on the `Swarm.ResourceMgr.MaxMemory` and `Swarm.ResourceMgr.MaxFileDescriptors` inputs above.
Expand Down Expand Up @@ -139,13 +139,17 @@ There is a go-libp2p issue ([#1928](https://github.com/libp2p/go-libp2p/issues/1
### How does the resource manager (ResourceMgr) relate to the connection manager (ConnMgr)?
As discussed [here](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#connmanager-vs-resource-manager)
these are separate systems in go-libp2p.
Kubo also configures the ConnMgr separately from ResourceMgr. There is no checking to make sure the limits between the systems are congruent.
Kubo also configures the ConnMgr separately from ResourceMgr. However sanity checks may lock them together.

Ideally `Swarm.ConnMgr.HighWater` is less than `Swarm.ResourceMgr.Limits.System.ConnsInbound`.
This is so the ConnMgr can kick in and cleanup connections based on connection priorities before the hard limits of the ResourceMgr are applied.
`Swarm.ConnMgr.HighWater` needs to be `Swarm.ResourceMgr.Limits.System.ConnsInbound` for the configuration to make sense.
This makes ConnMgr kick in and cleanup connections based on connection priorities before the hard limits of the ResourceMgr are applied.
If `Swarm.ConnMgr.HighWater` is greater than `Swarm.ResourceMgr.Limits.System.ConnsInbound`,
existing low priority idle connections can prevent new high priority connections from being established.
The ResourceMgr doesn't know that the new connection is high priority and simply blocks it because of the limit its enforcing.
The ResourceMgr doesn't know that the new connection is high priority and simply blocks it because of the limit its enforcing.

To ensure this happen with the default config kubo's autoscalling will make `Swarm.ResourceMgr.Limits.System.ConnsInbound` equal to
`Swarm.ConnMgr.HighWater` times two, it will also scale `Swarm.ResourceMgr.Limits.System.StreamsInbound` to keep the same streams to
connections ratio.

### How does one see the Active Limits?
A dump of what limits are actually being used by the resource manager ([Computed Default Limits](#computed-default-limits) + [User Supplied Override Limits](#user-supplied-override-limits))
Expand All @@ -156,9 +160,9 @@ This can be observed with an empty [`Swarm.ResourceMgr.Limits`](https://github.c
and then [seeing the active limits](#how-does-one-see-the-active-limits).

### How does one monitor libp2p resource usage?
For [monitoring libp2p resource usage](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#monitoring),
For [monitoring libp2p resource usage](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#monitoring),
various `*rcmgr_*` metrics can be accessed as the prometheus endpoint at `{Addresses.API}/debug/metrics/prometheus` (default: `http://127.0.0.1:5001/debug/metrics/prometheus`).
There are also [pre-built Grafana dashboards](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager/obs/grafana-dashboards) that can be added to a Grafana instance.
There are also [pre-built Grafana dashboards](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager/obs/grafana-dashboards) that can be added to a Grafana instance.

A textual view of current resource usage and a list of services, protocols, and peers can be
obtained via `ipfs swarm stats --help`
Expand Down

0 comments on commit b88d580

Please sign in to comment.