A lot of time is spent determining the size of the pending/queued transaction pool #16717

veox · 2018-05-09T16:14:19Z

EDIT: Original title: A lot of time is spent determining the length of txSortedMap

System information

Geth version: Geth/v1.8.7-stable-66432f38/linux-amd64/go1.10
OS & Version: Ubuntu 16.04.4 LTS (Xenial Xerus)

Background

I've tried to do txpool analysis, retaining as much of it as possible (e.g. having 32768 slots). My assumption was that this would mostly require copious amounts of memory, but it seemed to increase CPU load, too.

I've dropped the count down to 8192 and ran some profiling. (See link at the very bottom.)

Possible optimisation?..

In core/tx_pool.go, TxPool promoteExecutables() has a lot of nested loops that iterate over various lists of transactions.

Calls are made to txSortedMap's Len() (comes from core/tx_list.go); it seems that the structure does not have a specific field for this, so the determination is made on-the-fly.

In my tests, 5-10% of a "production" node's time is spent determining the length of these maps. This would seem to be true for most any txpool settings, unless severely restricting txpool.pricelimit.

(BTW, the default for that is 1 wei, which also contributes to the issue: the txpool is pretty much guaranteed to be immediately filled up by "default policy" peers, and remain in a constant state of churn from then on.)

Since this probably can't be avoided, perhaps it can be optimised for?..

Steps to reproduce the behaviour

Run geth something like:

/usr/bin/geth --metrics \
    --datadir /home/geth/.ethereum \
    --cache 3072 \ 
    --txpool.pricelimit 31337000 --txpool.accountslots 4 --txpool.globalslots 8192 --txpool.accountqueue 4 \
    --lightserv 50 --lightpeers 1000 --maxpeers 1025

In console:

debug.cpuProfile('/home/geth/fiddled.prof', 3600)

Generate SVG:

go tool pprof fiddled.prof
(pprof) svg > fiddled.svg

CPU profiling graph

https://veox.pw/dump/fiddled.svg

Note: this was intentionally timed to a window with little DB flushing.

Cc: @AlexeyAkhunov - not sure if you're mostly interested is sync optimisation, or runtime also.

The text was updated successfully, but these errors were encountered:

AlexeyAkhunov · 2018-05-09T16:34:01Z

Looks like Len() function is getting called quite a lot on the elements of pool.pending. It looks like pool.pending is not mutated inside the promoteExecutables(), so a simple assignment to a variable should do the job.

ryanschneider · 2018-05-09T21:04:44Z

This seems odd: txSortedMap.Len is just a wrapper around len(m.items), which appears to be just a value lookup on h.count in the golang runtime:

https://golang.org/src/runtime/hashmap.go#L1232

My guess is its actually the number of operations done on pool.pending rather than the actual Len calls themselves.

veox · 2018-05-10T15:35:37Z

Done a bit of debug-print digging, most invocations are indeed done when estabilishing the pending map's ~~length~~ depth - here:

for _, list := range pool.pending {
    pending += uint64(list.Len())
}

pool.pending is a hash map:

pending map[common.Address]*txList

to txLists of likely different sizes.

Perhaps it is pool.pending that could benefit from a pool.pendingcount?..

EDIT: This section seems related to PR #16720, which might have made it obsolete.

The profiling graph doesn't show this, but the addTxs -> addTxsLocked -> promoteExecutables call path gets hit a lot passing in accounts one-by-one, although this passage in addTxsLocked suggests that's not the ~~expected~~ only supported mode of operation:

addrs := make([]common.Address, 0, len(dirty))
for addr := range dirty {
	addrs = append(addrs, addr)
}
pool.promoteExecutables(addrs)

In other words, addTxsLocked receives a txs argument that often has just one account - although the function is capable of processing lists of transactions from many accounts.

veox · 2018-05-10T18:30:00Z

P.S. Although in current network conditions this manifests mostly with pool.pending, it is likely that with an enormous txpool it would be visible instead with pool.queue.

The crux of the issue is that these are "hash maps of (non-homogenous) lists", and their size is determined by iterating over the elements.

stale · 2019-05-30T10:55:57Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

veox · 2019-05-30T17:40:12Z

AFAIK nothing has changed between v1.8.7 and v1.8.27: pool.pending is still iterated over, and it's still a hash map keyed by common.address, - with various-length lists as values.

I haven't run profiling on v1.8.27, but would guess the inefficiency is still present.

karalabe · 2019-11-07T09:53:39Z

The 1.9 release family changed the way transactions are shuffled within the bool from a synchronous one after every insert to an async one that runs in the background. This means that if there's high churn, the shuffling (and inherently the iterating) will run much much rarer. Could you perhaps try if a latest release still exhibits this behavior? My initial guess would be that it does not.

veox · 2019-11-15T13:21:10Z

I'm not able to replicate the experiment completely (the machine has been "beefed up" resource-wise). Below is with geth v1.9.6.

However, the situation seems to have gotten "worse". Running:

/usr/bin/geth --nousb --datadir /home/geth/.ethereum \
    --v5disc --cache 6144 \
    --txpool.pricelimit 31337000 --txpool.globalslots 8192 \
    --txpool.accountslots 4 --txpool.accountqueue 4 \
    --ethstats "veox-geth-lightserv:$ETHSTATS_SECRET@ethstats.net" \
    --light.serve 100 --light.maxpeers 475 --maxpeers 500 \
    --metrics

Notice the inclusion of ethstats, which constantly queries the size of txpool. txpool.pricelimit is same as in original experiment, so as to guarantee constant churn.

Profiling graph: https://veox.pw/dump/fiddled-v1.9.6.svg

It's now at 18.93% (another round shows 19.20%, another one 18.00%). Most likely an indicator of how everything else improved...

I still see no solution for this other than to have a len counter on txSortedMap, incrementing it as transactions are added, decrementing it as they are removed.

But that is patently unfriendly to multi-processing, and I don't know the proper golang approach. :(

holiman · 2020-09-10T09:11:03Z

We've made a lot of improvements lately:

I believe this is mostly fixed now, so closing. Please open a new ticket if this problem still is present

veox changed the title ~~A lot of time is spent determining the length of txSortedMap~~ A lot of time is spent determining the size of the pending transaction pool May 10, 2018

veox changed the title ~~A lot of time is spent determining the size of the pending transaction pool~~ A lot of time is spent determining the size of the pending/queued transaction pool May 10, 2018

stale bot added status:inactive and removed status:inactive labels May 30, 2019

adamschmideg added status:triage priority:2 and removed status:triage labels Nov 7, 2019

holiman closed this as completed Sep 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A lot of time is spent determining the size of the pending/queued transaction pool #16717

A lot of time is spent determining the size of the pending/queued transaction pool #16717

veox commented May 9, 2018 •

edited

Loading

AlexeyAkhunov commented May 9, 2018

ryanschneider commented May 9, 2018

veox commented May 10, 2018 •

edited

Loading

veox commented May 10, 2018

stale bot commented May 30, 2019

veox commented May 30, 2019 •

edited

Loading

karalabe commented Nov 7, 2019

veox commented Nov 15, 2019 •

edited

Loading

holiman commented Sep 10, 2020

A lot of time is spent determining the size of the pending/queued transaction pool #16717

A lot of time is spent determining the size of the pending/queued transaction pool #16717

Comments

veox commented May 9, 2018 • edited Loading

System information

Background

Possible optimisation?..

Steps to reproduce the behaviour

CPU profiling graph

AlexeyAkhunov commented May 9, 2018

ryanschneider commented May 9, 2018

veox commented May 10, 2018 • edited Loading

veox commented May 10, 2018

stale bot commented May 30, 2019

veox commented May 30, 2019 • edited Loading

karalabe commented Nov 7, 2019

veox commented Nov 15, 2019 • edited Loading

holiman commented Sep 10, 2020

veox commented May 9, 2018 •

edited

Loading

veox commented May 10, 2018 •

edited

Loading

veox commented May 30, 2019 •

edited

Loading

veox commented Nov 15, 2019 •

edited

Loading